Ulfar Erlingsson - FujitsuTwo common data sets in Machine Learning MNIST dataset: 70,000 images....

Post on 17-May-2020

9 views 0 download

Transcript of Ulfar Erlingsson - FujitsuTwo common data sets in Machine Learning MNIST dataset: 70,000 images....

1 © 2019 FUJITSU

Ulfar Erlingsson

Senior Staff Research Scientist at Google

Heads a team within Google Brain doing research on privacy and security for machine learning.

Previously, he has been a researcher at Microsoft Research, Silicon Valley and an Associate Professor at Reykjavik University, Iceland.

Is privacy an obstacle?Where does it raise the biggest challenge?

Metaphor for Privacy(randomized response)

Microdata: An Individual’s Report

Microdata: An Individual’s Report

Each bit is flipped with probability

25%

Big Picture Remains!

Two common data sets in Machine Learning

MNIST dataset: 70,000 images

28⨉28 pixels each

CIFAR-10 dataset: 60,000 color images

32⨉32 pixels each

What are the utility benefits / costs of ML privacy ?

Training ML models with privacy works and ensures strong generalization… and may help with data retention & removal concerns

But...Training with privacy means the MLmodel cannot “see” unique outliers

Model can’t learn about truly weird data

Utility of privacy-preserving ML models may always be worse on real outliers