Learning and Differential Privacy Guarding user Privacy with...

Guarding user Privacy with Federated Learning and Differential PrivacyBrendan McMahanmcmahan@google.comDIMACS/Northeast Big Data Hub Workshop on Overcoming Barriers to Data Sharing including Privacy and Fairness2017.10.24

Imbue mobile devices with state of the art machine learning systems without centralizing data and with privacy by default.

Our Goal

Federated Learning

A very personal computer2015: 79% away from phone ≤2 hours/day1 63% away from phone ≤1 hour/day 25% can't remember being away at all

2013: 72% of users within 5 feet of phone most of the time2.

Plethora of sensors

Innumerable digital interactions12015 Always Connected Research Report, IDC and Facebook22013 Mobile Consumer Habits Study, Jumio and Harris Interactive.

Our Goal

Federated Learning

Deep Learning

non-convex

millions of parameters

complex structure (eg LSTMs)

Our Goal

Federated Learning

Distributed learning problem Horizontally partitioned

Nodes: millions to billions

Dimensions: thousands to millions

Examples: millions to billions

Our Goal

Federated Learning

Federated decentralization

facilitator

Our Goal

Federated Learning

Deep Learning, the short short version

Is it 5?

f(input, parameters) = output

Is it 5?

loss(parameters) = 1/n ∑i difference(f(inputi, parameters), desiredi)

Is it 5?

Adjust these

to minimize this

Stochastic Choose a random subsetof training data

Compute the "down" directionon the loss function

Take a step in that direction

(Rinse & repeat)

Gradient

Descent

Cloud-centric ML for Mobile

Current Model Parameters

The model lives in the cloud.

trainingdata

We train models in the cloud.

Mobile Device

request

prediction

Make predictions in the cloud.

trainingdata

request

prediction

Gather training data in the cloud.

trainingdata

And make the models better.

On-Device Predictions(Inference)

request

prediction

Instead of making predictions in the cloud

Distribute the model,make predictions on device.

User Advantages● Low latency● Longer battery life● Less wireless data transfer● Better offline experience● Less data sent to the cloud

Developer Advantages● Data is already localized● New product opportunities

World Advantages● Raise privacy expectations for the industry

On-device inference

On-Device Inference

(training data stays on device)

Developer Advantages● Data is already localized● New product opportunities● Straightforward personalization● Simple access to rich user context

On-device training

On-Device Inference

Bringing model training onto mobile devices.

(training data stays on device)

Developer Advantages● Data is already localized● New product opportunities● Straightforward personalization● Simple access to rich user context

On-device training

On-Device Inference

Federated Learning

Bringing model training onto mobile devices.

Federated Learning

Federated Learning is the problem of training a shared global model under the coordination of a central server, from a federation of participating devices which maintain control of their own data.

Federated Learning

Mobile Device

Local Training Data

Federated Learning

CloudServiceProvider

Mobile Device

Local Training Data

Many devices will be offline.

Federated Learning

CloudServiceProvider

Mobile Device

Local Training Data

1. Server selects a sample of e.g. 100 online devices.

Federated Learning

Mobile Device

Local Training Data

Federated Learning

1. Server selects a sample of e.g. 100 online devices.

2. Selected devices download the current model parameters.

Federated Learning

3. Users compute an update using local training data

Federated Learning

4. Server aggregates users' updates into a new model.

Federated Learning

Repeat until convergence.

Applications of federating learning

What makes a good application?

● On-device data is more relevant than server-side proxy data

● On-device data is privacy sensitive or large

● Labels can be inferred naturally from user interaction

Example applications

● Language modeling (e.g., next word prediction) for mobile keyboards

● Image classification for predicting which photos people will share

● ...

Challenges of Federated LearningMassively Distributed

Training data is stored across a very large number of devices

Limited CommunicationOnly a handful of rounds of unreliable communication with each devices

Unbalanced DataSome devices have few examples, some have orders of magnitude more

Highly Non-IID DataData on each device reflects one individual's usage pattern

Unreliable Compute NodesDevices go offline unexpectedly; expect faults and adversaries

Dynamic Data AvailabilityThe subset of data available is non-constant, e.g. time-of-day vs. country

… or, why this isn't just "standard" distributed optimization

Server

Until Converged:1. Select a random subset (e.g. 100) of the (online) clients

2. In parallel, send current parameters θt to those clients

3. θt+1 = θt + data-weighted average of client updates

H. B. McMahan, et al. Communication-Efficient Learning of Deep Networks from Decentralized Data. AISTATS 2017

Selected Client k

1. Receive θt from server.

2. Run some number of minibatch SGD steps, producing θ'

3. Return θ'-θt to server.

The Federated Averaging algorithm

Rounds to reach 10.5% Accuracy

FedSGD 820FedAvg 35

Large-scale LSTM for next-word prediction

decrease in communication rounds

Model Details1.35M parameters10K word dictionaryembeddings ∊ℝ96, state ∊ℝ256

corpus: Reddit posts, by author

CIFAR-10 convolutional model

Updates to reach 82%SGD 31,000FedSGD 6,600FedAvg 630

(IID and balanced data)

decrease in communication (updates) vs SGD

Federated Learning&

Privacy

4. Server aggregates users' updates into a new model.

Federated Learning

Repeat until convergence.

Federated LearningMight these updates contain privacy-sensitive data?

Might these updates contain privacy-sensitive data?

1. Ephemeral

2. Focussed

Improve privacy & security by

minimizing the "attack surface"

1. Ephemeral

2. Focussed

3. Only in aggregate∑

Wouldn't it be even better if ...

Google aggregates users' updates, but cannot inspect the individual updates.

A novel, practical protocol

K. Bonawitz, et.al. Practical Secure Aggregation for Privacy-Preserving Machine Learning. CCS 2017.

Might the final model memorize a user's data?

1. Ephemeral

2. Focussed

3. Only in aggregate

4. Differentially private

Differential Privacy

Differential PrivacyDifferential Privacy(trusted aggregator)

Server

Until Converged:1. Select a random subset (e.g. C=100) of the (online) clients

Selected Client k

3. Return θ'-θt to server.

Federated Averaging

Server

Until Converged:1. Select each user independently with probability q, for say E[C]=1000 clients

Selected Client k

3. Return θ'-θt to server. θt

Differentially-Private Federated AveragingH. B. McMahan, et al. Learning Differentially Private Language Models Without Losing Accuracy.

Server

Selected Client k

3. Return Clip(θ'-θt) to server. θt

Server

3. θt+1 = θt + bounded sensitivity data-weighted average of client updates

Selected Client k

Server

3. θt+1 = θt + bounded sensitivity data-weighted average of client updates

+ Gaussian noise N(0, Iσ2)

Selected Client k

Privacy Accounting for Noisy SGD: Moments Accountant

M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, & L. Zhang. Deep Learning with Differential Privacy.CCS 2016.

Moments Accountant

Previous composition theorems

FedSGD 820FedAvg 35

decrease in communication rounds

FedSGD 820FedAvg 35

decrease in database queries

The effect of clipping updates

No Clipping

AggressiveClipping

Sampling E[C] = 100users per round.

The effect of clipping updates

No Clipping

AggressiveClipping

The effect of noising updates

Clipping at S=20

(4.634, 1e-9)-DP with 763k users(1.152, 1e-9)-DP with 1e8 users

Differential Privacy for Language Models

H. B. McMahan, et al. Learning Differentially Private Language Models Without Losing Accuracy.

Non-private baseline

Baseline Trainingusers per round = 100tokens per round = 160k17.5% accuracy in 4120 rounds

(1.152, 1e-9) DP Training[users per round] = 5k[tokens per round] = 8000k

17.5% estimated accuracy in 5000 rounds

Baseline Trainingusers per round = 100tokens per round = 160k17.5% accuracy in 4120 rounds

(1.152, 1e-9) DP Training[users per round] = 5k[tokens per round] = 8000k

17.5% estimated accuracy in 5000 rounds

Private training achieves equal accuracy, but using 60x more computation.

Differential PrivacyDifferential Privacy(trusted aggregator)

Differential PrivacyLocal Differential Privacy

Differential PrivacyDifferential Privacywith Secure Aggregation+

Differential Privacy is complementary to Federated Learning

● FL algorithms touch data one user (one device) at time --- natural algorithms for user-level privacy

● Communication constraints mean we want to touch the data as few times as possible --- also good for privacy.

● The DP guarantee is complementary to FL's focussed collection & ephemeral updates.

Federated Learning

Federated Learning in Gboard

Open Questions and ChallengesShowing privacy is possible

Many open research questions:- Further lower computational and/or utility cost of differential privacy- More communication-efficient algorithms for FL

Making privacy easyPossible is not enough. Research to enable "privacy by default" in machine learning.- Can federated learning be as easy as centralized learning?- Differential privacy for deep learning without parameter tuning?- How do we handle privacy budgets across time and across domains?

Questions

Learning and Differential Privacy Guarding user Privacy with...

Documents

Transcript of Learning and Differential Privacy Guarding user Privacy with...

October 29, 2003RUHSRI1 Homeland Security Research at Rutgers University Fred Roberts Chair, RUHSRI Director, DIMACS Center froberts@dimacs.rutgers.edu.

Rapidly Deployable Wireless Networks for Emergency ...dimacs.rutgers.edu/Workshops/RutgersHomeland/... · Rapidly Deployable Networks: Rationale Failure of communication networks

De vijf hoofdvragen van privacy - bestuur.gooisemeren.nl · Lichamelijke privacy Gegevens-privacy Huiselijke privacy Communicatie - privacy Vier soorten privacy Art. 10 Grondwet Art.

PRIVACY COMPLIANCE An Introduction to Privacy Privacy Training.

Privacy - Introductie Beware of privacy – security fallacies ! Privacy ...

Aggregating Inconsistent Information: Ranking and …dimacs.rutgers.edu/~alantha/papers2/aggregating_journal.pdfAggregating Inconsistent Information: Ranking and Clustering ... the

Imbue Internship Program

Introduction to Privacy and Privacy Engineering

Future of Privacy Forum€¦ · PRIVACY 2020 10 Privacy Risk$?ncl 10 Privacy Enhåpcing Technologies o-Watc int e Nex Decad FUTURE PRIVACY FORUM DATA PRIVACY DAY 2020 Jules Polonetsky

GRADUATORIA PROVVISORIA PER L'ACCESSO AGEVOLATO AI …€¦ · 1084197 privacy privacy si 19 1079468 privacy privacy si 20 1086180 privacy privacy si 21 1081305 privacy privacy si

Library privacy and the privacy audit

Privacy is dood! leve de privacy!

High-dimensional graphical model selection: Practical and ...dimacs.rutgers.edu/Workshops/UnifyingTheory/Slides/wainwright.pdfPractical and information-theoretic limits Martin Wainwright

Lecture 19 Page 1 CS 236 Online Privacy Privacy vs. security? Data privacy issues Network privacy issues…

Privacy Regulation: Culturally Universal or Culturally ...courses.cs.vt.edu/cs6204/Privacy-Security/Papers/Privacy/Privacy... · PRIVACY REGULATION 67 A THEORETICAL APPROACH TO PRIVACY

PRIVACY CRITERIA. Roadmap Privacy in Data mining Mobile privacy (k-e) – anonymity (c-k) – safety Privacy skyline.

Imbue by JRusso Spring 2014 Look Book

Machine Learning Algorithms for Classiﬁcation Princeton ...dimacs.rutgers.edu/Workshops/DataMiningTutorial/slides/schapire.pdf · Machine Learning Algorithms for Classiﬁcation

Privacy by Design: The Future of Privacy… · PRIVACY BY DESIGN (PbD) A proactive approach to privacy that supplements privacy principles in a manner that promotes innovation, privacy,

Randomized Load Balancing and Oblivious Routing …dimacs.rutgers.edu/Workshops/NextGenerationNetworks/slides/Winzer.pdfRandomized Load Balancing and Oblivious Routing ... DIMACS Tutorial