Table of Contents

Federated Learning

Federated learning (FL) is a distributed machine learning paradigm where multiple clients collaboratively train a shared model without exchanging raw data, instead sharing only model updates to preserve privacy. It operates through iterative rounds of local training on client devices and server-side aggregation of updates. 1)

How It Works

FL follows a client-server architecture through repeated rounds:

  1. Server broadcasts the current global model to participating clients.
  2. Local training: Each client trains a copy of the model on its private data using stochastic gradient descent (SGD) for a fixed number of epochs, computing a local update (gradient or parameter difference). 2)
  3. Upload: Clients send encrypted or masked updates to the central server.
  4. Aggregation: The server aggregates all client updates to form an improved global model.
  5. Repeat: The updated global model is broadcast for the next round.

The FedAvg Algorithm

Federated Averaging (FedAvg), introduced by McMahan et al. in the 2017 paper “Communication-Efficient Learning of Deep Networks from Decentralized Data,” is the foundational FL algorithm. It weights client updates by the size of their local datasets:

Global model w(t+1) = sum over k of (n_k / n) * w_k(t+1)

where n_k is client k's data size, n is the total across all clients, and w_k(t+1) is the local model after training from global weights w(t). This reduces communication overhead by performing multiple local SGD steps before aggregation. 3)

Google's Gboard: The Origin Story

Google pioneered federated learning in 2016 for next-word prediction on Android's Gboard mobile keyboard, training across millions of devices without centralizing user typing data. This was the first large-scale FL deployment, handling heterogeneous mobile data while improving predictions. 4)

Later expansions included Smart Text Selection models using secure aggregation (SecAgg) and distributed differential privacy (DDP), reducing memorization of user data by over twofold in empirical tests. 5)

Privacy Preservation Mechanisms

FL minimizes raw data exposure but remains vulnerable to inference attacks on model updates. Key privacy mechanisms include:

Differential Privacy (DP): Adds calibrated noise (Gaussian or Laplace) to updates or aggregates, bounding any individual's influence via a privacy parameter epsilon (smaller epsilon means stronger privacy). Noise can be added client-side (local DP) or server-side (central DP). 6)

Secure Aggregation (SecAgg): A cryptographic protocol ensuring the server sees only the sum of masked client updates, not individual contributions. When paired with DP, it enforces data minimization and formal privacy guarantees even against honest-but-curious servers. 7)

Distributed Differential Privacy (DDP): Integrates noise within the secure aggregation protocol for trustless privacy guarantees, deployed at scale in Google's production systems. 8)

Adaptive DP: Adjusts noise budgets dynamically to balance utility and privacy, mitigating excessive noise after model convergence. 9)

Horizontal vs. Vertical FL

Horizontal FL: Clients share the same feature space but have different samples. This is the most common setting, exemplified by Gboard where all users have the same type of data (text input) but different instances. 10)

Vertical FL: Clients share overlapping samples but have different features. For example, one bank has transaction data and another has credit history for the same customers. This requires entity resolution and secure feature alignment.

Cross-Silo vs. Cross-Device

Cross-silo: A small number of trusted clients (hospitals, banks) with powerful servers. Participation is reliable with lower data heterogeneity. Well-suited for healthcare and finance applications. 11)

Cross-device: Massive scale with millions of mobile devices (like Gboard). Characterized by high client churn, weak device capabilities, and non-IID data distributions. Emphasizes communication efficiency and robustness. 12)

Use Cases

Healthcare: Federated tumor detection across hospitals enables multi-institution models while complying with HIPAA and GDPR. Electronic Health Record (EHR) analysis for rare diseases benefits from pooled learning without sharing patient data. Deployments grew 10x in healthcare post-2020. 13)

Finance: Collaborative fraud detection models across banks improve anomaly detection via diverse data while preserving competitive privacy. Credit scoring benefits from pooled insights without exposing individual transaction data.

Mobile and IoT: Next-word prediction, smart text selection, voice recognition, and personalized recommendations on edge devices.

Challenges

Frameworks

Key Papers

See Also

References

1) , 2) , 3) , 10) , 11) , 13) , 14) , 18) , 19)