Salil - The Privacy Tools Project · THE PRIVACY TOOLS PROJECT December 11, 2017 ... PPRA, ESRA,...

19
THE PRIVACY TOOLS PROJECT December 11, 2017 Salil Vadhan Harvard University Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of our funders. with support from:

Transcript of Salil - The Privacy Tools Project · THE PRIVACY TOOLS PROJECT December 11, 2017 ... PPRA, ESRA,...

Page 1: Salil - The Privacy Tools Project · THE PRIVACY TOOLS PROJECT December 11, 2017 ... PPRA, ESRA, …. Vision: array of computational, legal, policy tools that make privacy-protective

THE PRIVACY TOOLS PROJECTDecember 11, 2017

Salil VadhanHarvard University

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of our funders.

with support from:

Page 2: Salil - The Privacy Tools Project · THE PRIVACY TOOLS PROJECT December 11, 2017 ... PPRA, ESRA, …. Vision: array of computational, legal, policy tools that make privacy-protective

Motivation: Computational Social Science

The potential: massive new sources of data and ease of sharing will revolutionize social science.

The problem: protecting the privacy of individual subjects

privacy open data

e.g. NYT 5/21/12 “Troves of Personal Data, Forbidden to Researchers”

privacy

utility traditional approaches(e.g. “stripping PII”)

Page 3: Salil - The Privacy Tools Project · THE PRIVACY TOOLS PROJECT December 11, 2017 ... PPRA, ESRA, …. Vision: array of computational, legal, policy tools that make privacy-protective

Our Goal

computerscience

socialscience

data science

law &policy

privacy

utility

Achieve: &

Via:Program on Information ScienceMIT Libraries

Page 4: Salil - The Privacy Tools Project · THE PRIVACY TOOLS PROJECT December 11, 2017 ... PPRA, ESRA, …. Vision: array of computational, legal, policy tools that make privacy-protective

Dataverse Repositories around the world: 27 installations

Harvard Dataverse Repository:2400 dataverses with 75,000 datasetsand 2.9 million downloadsLargest social science repository in the world

Target: Data Repositories

Page 5: Salil - The Privacy Tools Project · THE PRIVACY TOOLS PROJECT December 11, 2017 ... PPRA, ESRA, …. Vision: array of computational, legal, policy tools that make privacy-protective

Datasets are restricted due to privacy concerns

Goal: enable wider sharing while protecting privacy

Page 6: Salil - The Privacy Tools Project · THE PRIVACY TOOLS PROJECT December 11, 2017 ... PPRA, ESRA, …. Vision: array of computational, legal, policy tools that make privacy-protective

Challenges for Sharing Sensitive DataDifficulty of Deidentification• Stripping “PII” usually provides

weak protections and/or poor utility

Inefficient Process for Obtaining Restricted Data• Can involve months of negotiation between institutions,

original researchers

Complexity of Law• Thousands of privacy laws in the US alone, at federal,

state and local level, usually context-specific: HIPAA, FERPA, CIPSEA, Privacy Act, PPRA, ESRA, ….

Vision: array of computational, legal, policy tools

that make privacy-protective data-sharing easier for researchers without

expertise in privacy law/cs/stats.

Sweeney ̀ 97

Page 7: Salil - The Privacy Tools Project · THE PRIVACY TOOLS PROJECT December 11, 2017 ... PPRA, ESRA, …. Vision: array of computational, legal, policy tools that make privacy-protective

Approach: Integrated Privacy Tools

RobotLawyers

DataTagsInterview

SensitiveDataSet

Depositinrepository

SensitiveDataSet

RestrictedAccessDataSetw/DUA

PSI:DifferentialPrivacy

PublicAccessStatistics

Toolswe areworking on

Page 8: Salil - The Privacy Tools Project · THE PRIVACY TOOLS PROJECT December 11, 2017 ... PPRA, ESRA, …. Vision: array of computational, legal, policy tools that make privacy-protective

DataTags

RobotLawyers

DataTagsInterview

SensitiveDataSet

Depositinrepository

SensitiveDataSet

RestrictedAccessDataSetw/DUA

PSI:DifferentialPrivacy

PublicAccessStatistics

Help generate policies for how to transfer, store, access, and usea sensitive dataset.

Crosas

Page 9: Salil - The Privacy Tools Project · THE PRIVACY TOOLS PROJECT December 11, 2017 ... PPRA, ESRA, …. Vision: array of computational, legal, policy tools that make privacy-protective

Robot Lawyers

RobotLawyers

DataTagsInterview

SensitiveDataSet

Depositinrepository

SensitiveDataSet

RestrictedAccessDataSetw/DUA

PSI:DifferentialPrivacy

PublicAccessStatistics

Automatically generate custom licenses & data-use agreements via logic programming

Altman

Page 10: Salil - The Privacy Tools Project · THE PRIVACY TOOLS PROJECT December 11, 2017 ... PPRA, ESRA, …. Vision: array of computational, legal, policy tools that make privacy-protective

PSI: Differential Privacy Tool

RobotLawyers

DataTagsInterview

SensitiveDataSet

Depositinrepository

SensitiveDataSet

RestrictedAccessDataSetw/DUA

PSI:DifferentialPrivacy

PublicAccessStatistics

Statistical summaries andexploratory data analysis withstrong privacy guarantees

Honaker

Page 11: Salil - The Privacy Tools Project · THE PRIVACY TOOLS PROJECT December 11, 2017 ... PPRA, ESRA, …. Vision: array of computational, legal, policy tools that make privacy-protective

Bridging Definitions of Privacy

RobotLawyers

DataTagsInterview

SensitiveDataSet

Depositinrepository

SensitiveDataSet

RestrictedAccessDataSetw/DUA

PSI:DifferentialPrivacy

PublicAccessStatistics

Argue that differentialprivacy satisfieslegal requirements

Wood

Page 12: Salil - The Privacy Tools Project · THE PRIVACY TOOLS PROJECT December 11, 2017 ... PPRA, ESRA, …. Vision: array of computational, legal, policy tools that make privacy-protective

Recoding Privacy Law

RobotLawyers

DataTagsInterview

SensitiveDataSet

Depositinrepository

SensitiveDataSet

RestrictedAccessDataSetw/DUA

PSI:DifferentialPrivacy

PublicAccessStatistics

Use technology toreimagine legalsolution space

Gasser

Page 13: Salil - The Privacy Tools Project · THE PRIVACY TOOLS PROJECT December 11, 2017 ... PPRA, ESRA, …. Vision: array of computational, legal, policy tools that make privacy-protective

Broader Impacts: Overarching Goals

• Exposing a multidisciplinary understanding of data privacy to a wide range of audiences (students, policymakers, public)

• Bringing integrated solutions to data privacy problems to practice (focusing on data repositories and computational social science)

Page 14: Salil - The Privacy Tools Project · THE PRIVACY TOOLS PROJECT December 11, 2017 ... PPRA, ESRA, …. Vision: array of computational, legal, policy tools that make privacy-protective

Broader Impacts

Policyimpact:WhiteHouseBigDataPrivacyStudy,NationalPrivacyResearchStrategy,NIST800-188DeidentifyingGovernmentDatasets,…

Traininginmultidisciplinaryresearch:≈120 students,postdocs,internsfromlaw,computerscience,socialscience,stats

Infrastructure forresearchinsocialscienceandotherhumansubjectsresearchfields

Page 15: Salil - The Privacy Tools Project · THE PRIVACY TOOLS PROJECT December 11, 2017 ... PPRA, ESRA, …. Vision: array of computational, legal, policy tools that make privacy-protective

Broader ImpactsNumerousworkshopsandsymposiaincludingpublicsymposiumwith700+registrants.

Newjournal“TechnologyScience”utilizingDataTags

Open-accesspedagogicalmaterialsondataprivacyformanyaudiences

Page 16: Salil - The Privacy Tools Project · THE PRIVACY TOOLS PROJECT December 11, 2017 ... PPRA, ESRA, …. Vision: array of computational, legal, policy tools that make privacy-protective

Other Accomplishments

• Manytheoreticalresultsilluminatingthelimitsofdifferentialprivacy(lowerbounds,algorithms,hardnessresults,attacks).

• Bridgingdifferentialprivacy&statisticalinference(confidenceintervals,hypothesistesting,Bayesiansampling)

• Frameworkformodernprivacyanalysis:catalogueprivacycontrols,identifyinformationuses,threats,andvulnerabilities,anddesigndataprogramsthataligntheseoverdatalifecycle.

Page 17: Salil - The Privacy Tools Project · THE PRIVACY TOOLS PROJECT December 11, 2017 ... PPRA, ESRA, …. Vision: array of computational, legal, policy tools that make privacy-protective

Lessons Learned: Interdisciplinary Research

2006 2007 2008 2009 2010 2011 2012 2013 2104 2015 2016 2017

CRCS postdocsorganize seminars

on privacy

CRCS & BerkmanData Privacy

Working Group

IQSS & Berkmantry to anonymize

Facebook dataset

Several Unsuccessful

Grant Proposals

Small Giftfrom Google

Common RulePolicy

Commentary

SuccessfulNSF Frontier

Proposal

FirstInterdisciplinary

Workshop

NSF Site Visit:Silos, Tools

NSF Site Visit:Students,Outreach

Bridging Privacy Defs Working

Group

Interdisciplinary Pubs, Tools,

Students

• Interdisciplinary centers to seed efforts• Shared motivating problem• Funding is hard• Policy commentary as a collaboration vehicle• Large & broad grant crucial• Building a community• Value of external critiques• Creating safe environments• It takes time!

Page 18: Salil - The Privacy Tools Project · THE PRIVACY TOOLS PROJECT December 11, 2017 ... PPRA, ESRA, …. Vision: array of computational, legal, policy tools that make privacy-protective

Lessons Learned: Theory vs. Practice(Caricature of) our initial proposal:

1. Solve biggest open theory problem in differential privacy literature.

2. Have summer interns implement our solution.3. Privacy-protective data-sharing solved!

Reality:• Asymptotic theoretical performance ≠ Practical performance• Even simplest theory solutions introduce challenges in practice

⇒ more interesting theory problems! • Can’t rely solely on interns for tool development• Long path from research prototypes to production software• Institutional challenges to sharing sensitive data

Page 19: Salil - The Privacy Tools Project · THE PRIVACY TOOLS PROJECT December 11, 2017 ... PPRA, ESRA, …. Vision: array of computational, legal, policy tools that make privacy-protective

• Share where we’ve come in the Privacy Tools Project.• Hear about related efforts & challenges.• Find collaborators, users.

Discuss directions forward:• “Applying Theoretical Advances in Privacy to

Computational Social Science Practice”

• “Computing over Distributed Sensitive Data”

• “Formal Privacy Models and Title 13”

• Production-level Tools for long-term, wide use (in planning)

Our Goals for Today

Chong

Nissim