Shawn Murphy, MD, PhD - Complying With Patient Expectations for Data De-identification

download Shawn Murphy, MD, PhD - Complying With Patient Expectations for Data De-identification

of 26

Transcript of Shawn Murphy, MD, PhD - Complying With Patient Expectations for Data De-identification

  • 8/7/2019 Shawn Murphy, MD, PhD - Complying With Patient Expectations for Data De-identification

    1/26

    Complying with Patient

    Expectations for Data

    De-identification

    Shawn Murphy, MD, Ph.D.

    Massachusetts General Hospital

  • 8/7/2019 Shawn Murphy, MD, PhD - Complying With Patient Expectations for Data De-identification

    2/26

    Principles that drive methodology

    Adequate data de-identification

    Trustworthy data recipients Physical data security

  • 8/7/2019 Shawn Murphy, MD, PhD - Complying With Patient Expectations for Data De-identification

    3/26

    Balance

    Adequate data de-identification

    Trustworthy

    data recipients Physical data security

  • 8/7/2019 Shawn Murphy, MD, PhD - Complying With Patient Expectations for Data De-identification

    4/26

    Principles that drive methodology

    Adequate data de-identification

    Trustworthy data recipients

    Physical data security

  • 8/7/2019 Shawn Murphy, MD, PhD - Complying With Patient Expectations for Data De-identification

    5/26

    De-identification vs. Usefulness Sweeney et. al.

    Cell suppression technique

    Ohno-Machado et. al. Destruction of critical data

    Easy reassembly of suppressed cells

  • 8/7/2019 Shawn Murphy, MD, PhD - Complying With Patient Expectations for Data De-identification

    6/26

    Principles that drive methodology

    Just adequate data de-identification

    Trustworthy data recipients

    Physical data security

    Trade-offs always exist

    implementing de-identification

  • 8/7/2019 Shawn Murphy, MD, PhD - Complying With Patient Expectations for Data De-identification

    7/26

    Principles that drive methodology

    Just adequate data de-identification

    Trustworthy data recipients

    Physical data security

  • 8/7/2019 Shawn Murphy, MD, PhD - Complying With Patient Expectations for Data De-identification

    8/26

    Use case for illustration of principle

    Marshfield Clinic

    Part of eMerge Project

    Unification of de-identified phenotypic EMR data

    with Genotypic Research Data

    No person with access to both data sets is

    allowed

  • 8/7/2019 Shawn Murphy, MD, PhD - Complying With Patient Expectations for Data De-identification

    9/26

    Principles that drive methodology

    Just adequate data de-identification

    Trustworthy data recipients

    Data will not be stolen

    Consider human profile of

    data recipient

  • 8/7/2019 Shawn Murphy, MD, PhD - Complying With Patient Expectations for Data De-identification

    10/26

    Patient Expectations Just adequate data de-identification

    Trustworthy data recipients

    Physical data security

  • 8/7/2019 Shawn Murphy, MD, PhD - Complying With Patient Expectations for Data De-identification

    11/26

    Lock down vs. Usability

    University of California, San Francisco

    Patient data must remain in Sandbox

    Environment high infrastructure cost

    Partners HealthCare, Boston

    Distribution of patient data from protected

    directories, audited access, through encryptedfiles vulnerable to human error.

  • 8/7/2019 Shawn Murphy, MD, PhD - Complying With Patient Expectations for Data De-identification

    12/26

  • 8/7/2019 Shawn Murphy, MD, PhD - Complying With Patient Expectations for Data De-identification

    13/26

    I2b2

    paradigm Informatics forIntegrating Biology and theBedside implementation of software platform forfinding and studying patient cohorts

    Human derived policy = no magic

    Goal is to arrive at policies that can beimplemented firmly, exactly, transparently,consistently, and electronically

    Security always

    enforced on the

    server side

  • 8/7/2019 Shawn Murphy, MD, PhD - Complying With Patient Expectations for Data De-identification

    14/26

  • 8/7/2019 Shawn Murphy, MD, PhD - Complying With Patient Expectations for Data De-identification

    15/26

    I2b2

    paradigm Five levels of data protection

    Obfuscated-data user advantages

    People unable to resolve data on a single patient

    Data managers can allow underlying (HIPAA)

    limited data sets

    Obfuscated-data user disadvantages

    Lock-out policies can be too restrictive and intrusive

    Group attack possible need careful user

    management

  • 8/7/2019 Shawn Murphy, MD, PhD - Complying With Patient Expectations for Data De-identification

    16/26

    I2b2

    paradigm Five levels of data protection

    Aggregated-data user unsubstantiated

    trustworthiness, (HIPAA) de-identified data,low physical security (client)

    Fundamental data set has 18 HIPAA identifiersremoved

    Unable to view line item patient data Unable to view narrative text about patient

    Unable to view any protected health information

    Internet access to client, simple login process

  • 8/7/2019 Shawn Murphy, MD, PhD - Complying With Patient Expectations for Data De-identification

    17/26

    I2b2

    paradigm Five levels of data protection

    Aggregated-data user advantages

    People unable to reverse engineer de-identified data

    on a single patient

    Aggregated-data user disadvantages

    Data managers responsible for creating (HIPAA)

    de-identified data sets

  • 8/7/2019 Shawn Murphy, MD, PhD - Complying With Patient Expectations for Data De-identification

    18/26

    I2b2

    paradigm Five levels of data protection

    Limited-data-set user moderate

    trustworthiness, underlying HIPAA LDS data,moderate physical security

    Fundamental data set has 16 HIPAA identifiersremoved

    Unable to view narrative text about patient Unable to view any protected health information

    (except dates and zip codes)

    Intranet access to data, institutional login process

  • 8/7/2019 Shawn Murphy, MD, PhD - Complying With Patient Expectations for Data De-identification

    19/26

    I2b2

    paradigm Five levels of data protection

    Aggregated-data user advantages

    People able to have direct access to data (as well as

    through client)

    Aggregated-data user disadvantages

    Data managers responsible for physical security and

    assessing user trustworthiness

  • 8/7/2019 Shawn Murphy, MD, PhD - Complying With Patient Expectations for Data De-identification

    20/26

    I2b2

    paradigm Five levels of data protection

    Notes-enabled LDS user moderate

    trustworthiness, underlying HIPAA LDS data,moderate physical security

    Fundamental data set has 16 HIPAA identifiersremoved

    Able to view narrative text about patient Unable to view any protected health information

    (except dates and zip codes)

    Intranet access to data, institutional login process

  • 8/7/2019 Shawn Murphy, MD, PhD - Complying With Patient Expectations for Data De-identification

    21/26

    I2b2

    paradigm Five levels of data protection

    Notes-enabled LDS user advantages

    People able to have direct access to data including

    de-identified notes

    Notes-enabled LDS user disadvantages

    Must be able to prove that scrubbing of notes can

    perform well

  • 8/7/2019 Shawn Murphy, MD, PhD - Complying With Patient Expectations for Data De-identification

    22/26

    I2b2

    paradigm Five levels of data protection

    PHI enabled user high trustworthiness,

    underlying fully identified, high physical

    security

    Fundamental data set has all PHI types available

    Able to view narrative text about patient in original

    form

    Encrypted data, institutional login process with IRB

    confirmation of access

  • 8/7/2019 Shawn Murphy, MD, PhD - Complying With Patient Expectations for Data De-identification

    23/26

    I2b2

    paradigm Five levels of data protection

    PHI enabled user advantages

    Full power of data can be realized, including

    recruitment of patients for clinical trials

    PHI enabled user disadvantages

    Risk is maximized for breach of privacy

    Investigators may be uncomfortable with this level

    of access

  • 8/7/2019 Shawn Murphy, MD, PhD - Complying With Patient Expectations for Data De-identification

    24/26

  • 8/7/2019 Shawn Murphy, MD, PhD - Complying With Patient Expectations for Data De-identification

    25/26

    References Fischetti M,Salazar J.Model and algorithms for the 2-dimensional cell

    suppression problem in statistical disclosure control. MathematicalProgramming 1999; 84:283-312.

    Ohno-Machado, L., Dreiseitl, S., Vinterbo, S., Effects of Data

    Anonymization by Cell Suppression, J Am MedInform Assoc.

    2002;9(Nov-Dec suppl):S115-S119.

    Dreiseitl, S., Vinterbo, S., Ohno-Machado,L., Disambiguation Data:Extracting Information from Anonymized Sources, Proc AMIA Fall Symp2001; 144-8.

    P. Samarati and L. Sweeney, Protecting Privacy When DisclosingInformation: k-Anonymity and Its Enforcement through Generalization andSuppression, Proc. IEEE Symp. Research in Security and Privacy, May1998

    Murphy, S.N., Chueh, H (2002). A Security Architecture for QueryTools Used to Access Large Biomedical Databases. AMIA, FallSymp. 2002, pages 552-556.

  • 8/7/2019 Shawn Murphy, MD, PhD - Complying With Patient Expectations for Data De-identification

    26/26

    Complying with Patient

    Expectations for Data

    De-identification

    Shawn Murphy, MD, Ph.D.

    Massachusetts General Hospital