Ulfar Erlingsson - FujitsuTwo common data sets in Machine Learning MNIST dataset: 70,000 images....

8
1 © 2019 FUJITSU Ulfar Erlingsson Senior Staff Research Scientist at Google Heads a team within Google Brain doing research on privacy and security for machine learning. Previously, he has been a researcher at Microsoft Research, Silicon Valley and an Associate Professor at Reykjavik University, Iceland.

Transcript of Ulfar Erlingsson - FujitsuTwo common data sets in Machine Learning MNIST dataset: 70,000 images....

Page 1: Ulfar Erlingsson - FujitsuTwo common data sets in Machine Learning MNIST dataset: 70,000 images. 28⨉28 pixels each. CIFAR-10 dataset: 60,000 color images. 32⨉32 pixels each

1 © 2019 FUJITSU

Ulfar Erlingsson

Senior Staff Research Scientist at Google

Heads a team within Google Brain doing research on privacy and security for machine learning.

Previously, he has been a researcher at Microsoft Research, Silicon Valley and an Associate Professor at Reykjavik University, Iceland.

Page 2: Ulfar Erlingsson - FujitsuTwo common data sets in Machine Learning MNIST dataset: 70,000 images. 28⨉28 pixels each. CIFAR-10 dataset: 60,000 color images. 32⨉32 pixels each

Is privacy an obstacle?Where does it raise the biggest challenge?

Page 3: Ulfar Erlingsson - FujitsuTwo common data sets in Machine Learning MNIST dataset: 70,000 images. 28⨉28 pixels each. CIFAR-10 dataset: 60,000 color images. 32⨉32 pixels each

Metaphor for Privacy(randomized response)

Page 4: Ulfar Erlingsson - FujitsuTwo common data sets in Machine Learning MNIST dataset: 70,000 images. 28⨉28 pixels each. CIFAR-10 dataset: 60,000 color images. 32⨉32 pixels each

Microdata: An Individual’s Report

Page 5: Ulfar Erlingsson - FujitsuTwo common data sets in Machine Learning MNIST dataset: 70,000 images. 28⨉28 pixels each. CIFAR-10 dataset: 60,000 color images. 32⨉32 pixels each

Microdata: An Individual’s Report

Each bit is flipped with probability

25%

Page 6: Ulfar Erlingsson - FujitsuTwo common data sets in Machine Learning MNIST dataset: 70,000 images. 28⨉28 pixels each. CIFAR-10 dataset: 60,000 color images. 32⨉32 pixels each

Big Picture Remains!

Page 7: Ulfar Erlingsson - FujitsuTwo common data sets in Machine Learning MNIST dataset: 70,000 images. 28⨉28 pixels each. CIFAR-10 dataset: 60,000 color images. 32⨉32 pixels each

Two common data sets in Machine Learning

MNIST dataset: 70,000 images

28⨉28 pixels each

CIFAR-10 dataset: 60,000 color images

32⨉32 pixels each

Page 8: Ulfar Erlingsson - FujitsuTwo common data sets in Machine Learning MNIST dataset: 70,000 images. 28⨉28 pixels each. CIFAR-10 dataset: 60,000 color images. 32⨉32 pixels each

What are the utility benefits / costs of ML privacy ?

Training ML models with privacy works and ensures strong generalization… and may help with data retention & removal concerns

But...Training with privacy means the MLmodel cannot “see” unique outliers

Model can’t learn about truly weird data

Utility of privacy-preserving ML models may always be worse on real outliers