====== Federated Learning ======

Federated learning (FL) is a distributed machine learning paradigm where multiple clients collaboratively train a shared model without exchanging raw data, instead sharing only model updates to preserve privacy. It operates through iterative rounds of local training on client devices and server-side aggregation of updates. ((source [[https://pmc.ncbi.nlm.nih.gov/articles/PMC8329160/|PMC: Federated Learning]]))

===== How It Works =====

FL follows a client-server architecture through repeated rounds:

  - **Server broadcasts** the current global model to participating clients.
  - **Local training**: Each client trains a copy of the model on its private data using stochastic gradient descent (SGD) for a fixed number of epochs, computing a local update (gradient or parameter difference). ((source [[https://pmc.ncbi.nlm.nih.gov/articles/PMC8329160/|PMC: Federated Learning]]))
  - **Upload**: Clients send encrypted or masked updates to the central server.
  - **Aggregation**: The server aggregates all client updates to form an improved global model.
  - **Repeat**: The updated global model is broadcast for the next round.

===== The FedAvg Algorithm =====

Federated Averaging (FedAvg), introduced by McMahan et al. in the 2017 paper "Communication-Efficient Learning of Deep Networks from Decentralized Data," is the foundational FL algorithm. It weights client updates by the size of their local datasets:

Global model w(t+1) = sum over k of (n_k / n) * w_k(t+1)

where n_k is client k's data size, n is the total across all clients, and w_k(t+1) is the local model after training from global weights w(t). This reduces communication overhead by performing multiple local SGD steps before aggregation. ((source [[https://pmc.ncbi.nlm.nih.gov/articles/PMC8329160/|PMC: Federated Learning]]))

===== Google's Gboard: The Origin Story =====

Google pioneered federated learning in 2016 for next-word prediction on Android's **Gboard** mobile keyboard, training across millions of devices without centralizing user typing data. This was the first large-scale FL deployment, handling heterogeneous mobile data while improving predictions. ((source [[https://research.google/blog/distributed-differential-privacy-for-federated-learning/|Google Research: Distributed DP for FL]]))

Later expansions included Smart Text Selection models using secure aggregation (SecAgg) and distributed differential privacy (DDP), reducing memorization of user data by over twofold in empirical tests. ((source [[https://research.google/blog/distributed-differential-privacy-for-federated-learning/|Google Research: Distributed DP for FL]]))

===== Privacy Preservation Mechanisms =====

FL minimizes raw data exposure but remains vulnerable to inference attacks on model updates. Key privacy mechanisms include:

**Differential Privacy (DP)**: Adds calibrated noise (Gaussian or Laplace) to updates or aggregates, bounding any individual's influence via a privacy parameter epsilon (smaller epsilon means stronger privacy). Noise can be added client-side (local DP) or server-side (central DP). ((source [[https://flower.ai/docs/framework/explanation-differential-privacy.html|Flower: Differential Privacy]]))

**Secure Aggregation (SecAgg)**: A cryptographic protocol ensuring the server sees only the sum of masked client updates, not individual contributions. When paired with DP, it enforces data minimization and formal privacy guarantees even against honest-but-curious servers. ((source [[https://research.google/blog/distributed-differential-privacy-for-federated-learning/|Google Research: Distributed DP for FL]]))

**Distributed Differential Privacy (DDP)**: Integrates noise within the secure aggregation protocol for trustless privacy guarantees, deployed at scale in Google's production systems. ((source [[https://research.google/blog/distributed-differential-privacy-for-federated-learning/|Google Research: Distributed DP for FL]]))

**Adaptive DP**: Adjusts noise budgets dynamically to balance utility and privacy, mitigating excessive noise after model convergence. ((source [[https://arxiv.org/abs/2510.09691|arXiv: Adaptive DP for FL]]))

===== Horizontal vs. Vertical FL =====

**Horizontal FL**: Clients share the same feature space but have different samples. This is the most common setting, exemplified by Gboard where all users have the same type of data (text input) but different instances. ((source [[https://pmc.ncbi.nlm.nih.gov/articles/PMC8329160/|PMC: Federated Learning]]))

**Vertical FL**: Clients share overlapping samples but have different features. For example, one bank has transaction data and another has credit history for the same customers. This requires entity resolution and secure feature alignment.

===== Cross-Silo vs. Cross-Device =====

**Cross-silo**: A small number of trusted clients (hospitals, banks) with powerful servers. Participation is reliable with lower data heterogeneity. Well-suited for healthcare and finance applications. ((source [[https://pmc.ncbi.nlm.nih.gov/articles/PMC8329160/|PMC: Federated Learning]]))

**Cross-device**: Massive scale with millions of mobile devices (like Gboard). Characterized by high client churn, weak device capabilities, and non-IID data distributions. Emphasizes communication efficiency and robustness. ((source [[https://research.google/blog/distributed-differential-privacy-for-federated-learning/|Google Research: Distributed DP for FL]]))

===== Use Cases =====

**Healthcare**: Federated tumor detection across hospitals enables multi-institution models while complying with HIPAA and GDPR. Electronic Health Record (EHR) analysis for rare diseases benefits from pooled learning without sharing patient data. Deployments grew 10x in healthcare post-2020. ((source [[https://pmc.ncbi.nlm.nih.gov/articles/PMC8329160/|PMC: Federated Learning]]))

**Finance**: Collaborative fraud detection models across banks improve anomaly detection via diverse data while preserving competitive privacy. Credit scoring benefits from pooled insights without exposing individual transaction data.

**Mobile and IoT**: Next-word prediction, smart text selection, voice recognition, and personalized recommendations on edge devices.

===== Challenges =====

  * **Non-IID data**: Client data distributions differ significantly (e.g., user-specific typing patterns), causing model drift. FedAvg struggles with extreme heterogeneity; personalization techniques help. ((source [[https://pmc.ncbi.nlm.nih.gov/articles/PMC8329160/|PMC: Federated Learning]]))
  * **Communication costs**: Transmitting high-bandwidth model updates is expensive; mitigated by compression, quantization, or fewer aggregation rounds.
  * **Model poisoning**: Malicious clients can inject bad updates to corrupt the global model; defended via robust aggregation, DP noise, or client selection strategies. ((source [[https://arxiv.org/abs/2510.09691|arXiv: Adaptive DP for FL]]))
  * **Heterogeneous devices**: Varying compute capabilities and intermittent connectivity in cross-device settings.
  * **Incentive alignment**: Motivating clients to participate and contribute quality updates.

===== Frameworks =====

  * **TensorFlow Federated (TFF)**: Google's open-source framework for FL simulations and deployments, with static verification of SecAgg and DP. ((source [[https://research.google/blog/distributed-differential-privacy-for-federated-learning/|Google Research: Distributed DP for FL]]))
  * **PySyft**: Privacy-focused framework from OpenMined supporting DP and secure multi-party computation.
  * **NVIDIA FLARE**: Enterprise-grade framework for healthcare applications, offering plug-and-play cross-silo FL with imaging and EHR support.
  * **Flower**: Flexible, framework-agnostic FL platform supporting heterogeneous environments. ((source [[https://flower.ai/docs/framework/explanation-differential-privacy.html|Flower: Differential Privacy]]))

===== Key Papers =====

  * McMahan et al. (2017): "Communication-Efficient Learning of Deep Networks from Decentralized Data" — introduced FedAvg. ((source [[https://pmc.ncbi.nlm.nih.gov/articles/PMC8329160/|PMC: Federated Learning]]))
  * Dong et al. (2021): Federated f-differential privacy framework for record-level protection. ((source [[https://pmc.ncbi.nlm.nih.gov/articles/PMC8329160/|PMC: Federated Learning]]))
  * Google Research: Distributed DP + SecAgg production deployments at scale. ((source [[https://research.google/blog/distributed-differential-privacy-for-federated-learning/|Google Research: Distributed DP for FL]]))

===== See Also =====

  * [[vector_embeddings]]
  * [[sovereign_ai]]
  * [[ai_sustainability]]
  * [[human_in_the_loop]]

===== References =====