Shawn Murphy, MD, PhD - Complying With Patient Expectations for Data De-identification
Transcript of Shawn Murphy, MD, PhD - Complying With Patient Expectations for Data De-identification
-
8/7/2019 Shawn Murphy, MD, PhD - Complying With Patient Expectations for Data De-identification
1/26
Complying with Patient
Expectations for Data
De-identification
Shawn Murphy, MD, Ph.D.
Massachusetts General Hospital
-
8/7/2019 Shawn Murphy, MD, PhD - Complying With Patient Expectations for Data De-identification
2/26
Principles that drive methodology
Adequate data de-identification
Trustworthy data recipients Physical data security
-
8/7/2019 Shawn Murphy, MD, PhD - Complying With Patient Expectations for Data De-identification
3/26
Balance
Adequate data de-identification
Trustworthy
data recipients Physical data security
-
8/7/2019 Shawn Murphy, MD, PhD - Complying With Patient Expectations for Data De-identification
4/26
Principles that drive methodology
Adequate data de-identification
Trustworthy data recipients
Physical data security
-
8/7/2019 Shawn Murphy, MD, PhD - Complying With Patient Expectations for Data De-identification
5/26
De-identification vs. Usefulness Sweeney et. al.
Cell suppression technique
Ohno-Machado et. al. Destruction of critical data
Easy reassembly of suppressed cells
-
8/7/2019 Shawn Murphy, MD, PhD - Complying With Patient Expectations for Data De-identification
6/26
Principles that drive methodology
Just adequate data de-identification
Trustworthy data recipients
Physical data security
Trade-offs always exist
implementing de-identification
-
8/7/2019 Shawn Murphy, MD, PhD - Complying With Patient Expectations for Data De-identification
7/26
Principles that drive methodology
Just adequate data de-identification
Trustworthy data recipients
Physical data security
-
8/7/2019 Shawn Murphy, MD, PhD - Complying With Patient Expectations for Data De-identification
8/26
Use case for illustration of principle
Marshfield Clinic
Part of eMerge Project
Unification of de-identified phenotypic EMR data
with Genotypic Research Data
No person with access to both data sets is
allowed
-
8/7/2019 Shawn Murphy, MD, PhD - Complying With Patient Expectations for Data De-identification
9/26
Principles that drive methodology
Just adequate data de-identification
Trustworthy data recipients
Data will not be stolen
Consider human profile of
data recipient
-
8/7/2019 Shawn Murphy, MD, PhD - Complying With Patient Expectations for Data De-identification
10/26
Patient Expectations Just adequate data de-identification
Trustworthy data recipients
Physical data security
-
8/7/2019 Shawn Murphy, MD, PhD - Complying With Patient Expectations for Data De-identification
11/26
Lock down vs. Usability
University of California, San Francisco
Patient data must remain in Sandbox
Environment high infrastructure cost
Partners HealthCare, Boston
Distribution of patient data from protected
directories, audited access, through encryptedfiles vulnerable to human error.
-
8/7/2019 Shawn Murphy, MD, PhD - Complying With Patient Expectations for Data De-identification
12/26
-
8/7/2019 Shawn Murphy, MD, PhD - Complying With Patient Expectations for Data De-identification
13/26
I2b2
paradigm Informatics forIntegrating Biology and theBedside implementation of software platform forfinding and studying patient cohorts
Human derived policy = no magic
Goal is to arrive at policies that can beimplemented firmly, exactly, transparently,consistently, and electronically
Security always
enforced on the
server side
-
8/7/2019 Shawn Murphy, MD, PhD - Complying With Patient Expectations for Data De-identification
14/26
-
8/7/2019 Shawn Murphy, MD, PhD - Complying With Patient Expectations for Data De-identification
15/26
I2b2
paradigm Five levels of data protection
Obfuscated-data user advantages
People unable to resolve data on a single patient
Data managers can allow underlying (HIPAA)
limited data sets
Obfuscated-data user disadvantages
Lock-out policies can be too restrictive and intrusive
Group attack possible need careful user
management
-
8/7/2019 Shawn Murphy, MD, PhD - Complying With Patient Expectations for Data De-identification
16/26
I2b2
paradigm Five levels of data protection
Aggregated-data user unsubstantiated
trustworthiness, (HIPAA) de-identified data,low physical security (client)
Fundamental data set has 18 HIPAA identifiersremoved
Unable to view line item patient data Unable to view narrative text about patient
Unable to view any protected health information
Internet access to client, simple login process
-
8/7/2019 Shawn Murphy, MD, PhD - Complying With Patient Expectations for Data De-identification
17/26
I2b2
paradigm Five levels of data protection
Aggregated-data user advantages
People unable to reverse engineer de-identified data
on a single patient
Aggregated-data user disadvantages
Data managers responsible for creating (HIPAA)
de-identified data sets
-
8/7/2019 Shawn Murphy, MD, PhD - Complying With Patient Expectations for Data De-identification
18/26
I2b2
paradigm Five levels of data protection
Limited-data-set user moderate
trustworthiness, underlying HIPAA LDS data,moderate physical security
Fundamental data set has 16 HIPAA identifiersremoved
Unable to view narrative text about patient Unable to view any protected health information
(except dates and zip codes)
Intranet access to data, institutional login process
-
8/7/2019 Shawn Murphy, MD, PhD - Complying With Patient Expectations for Data De-identification
19/26
I2b2
paradigm Five levels of data protection
Aggregated-data user advantages
People able to have direct access to data (as well as
through client)
Aggregated-data user disadvantages
Data managers responsible for physical security and
assessing user trustworthiness
-
8/7/2019 Shawn Murphy, MD, PhD - Complying With Patient Expectations for Data De-identification
20/26
I2b2
paradigm Five levels of data protection
Notes-enabled LDS user moderate
trustworthiness, underlying HIPAA LDS data,moderate physical security
Fundamental data set has 16 HIPAA identifiersremoved
Able to view narrative text about patient Unable to view any protected health information
(except dates and zip codes)
Intranet access to data, institutional login process
-
8/7/2019 Shawn Murphy, MD, PhD - Complying With Patient Expectations for Data De-identification
21/26
I2b2
paradigm Five levels of data protection
Notes-enabled LDS user advantages
People able to have direct access to data including
de-identified notes
Notes-enabled LDS user disadvantages
Must be able to prove that scrubbing of notes can
perform well
-
8/7/2019 Shawn Murphy, MD, PhD - Complying With Patient Expectations for Data De-identification
22/26
I2b2
paradigm Five levels of data protection
PHI enabled user high trustworthiness,
underlying fully identified, high physical
security
Fundamental data set has all PHI types available
Able to view narrative text about patient in original
form
Encrypted data, institutional login process with IRB
confirmation of access
-
8/7/2019 Shawn Murphy, MD, PhD - Complying With Patient Expectations for Data De-identification
23/26
I2b2
paradigm Five levels of data protection
PHI enabled user advantages
Full power of data can be realized, including
recruitment of patients for clinical trials
PHI enabled user disadvantages
Risk is maximized for breach of privacy
Investigators may be uncomfortable with this level
of access
-
8/7/2019 Shawn Murphy, MD, PhD - Complying With Patient Expectations for Data De-identification
24/26
-
8/7/2019 Shawn Murphy, MD, PhD - Complying With Patient Expectations for Data De-identification
25/26
References Fischetti M,Salazar J.Model and algorithms for the 2-dimensional cell
suppression problem in statistical disclosure control. MathematicalProgramming 1999; 84:283-312.
Ohno-Machado, L., Dreiseitl, S., Vinterbo, S., Effects of Data
Anonymization by Cell Suppression, J Am MedInform Assoc.
2002;9(Nov-Dec suppl):S115-S119.
Dreiseitl, S., Vinterbo, S., Ohno-Machado,L., Disambiguation Data:Extracting Information from Anonymized Sources, Proc AMIA Fall Symp2001; 144-8.
P. Samarati and L. Sweeney, Protecting Privacy When DisclosingInformation: k-Anonymity and Its Enforcement through Generalization andSuppression, Proc. IEEE Symp. Research in Security and Privacy, May1998
Murphy, S.N., Chueh, H (2002). A Security Architecture for QueryTools Used to Access Large Biomedical Databases. AMIA, FallSymp. 2002, pages 552-556.
-
8/7/2019 Shawn Murphy, MD, PhD - Complying With Patient Expectations for Data De-identification
26/26
Complying with Patient
Expectations for Data
De-identification
Shawn Murphy, MD, Ph.D.
Massachusetts General Hospital