Li Xiong CS573 Data Privacy and Security

16
Li Xiong CS573 Data Privacy and Security Healthcare privacy and security: Genomic data privacy

description

Healthcare privacy and security: Genomic data privacy. Li Xiong CS573 Data Privacy and Security. Genomic data privacy. Genomic data are increasingly collected, stored, and shared in research and clinical environments - PowerPoint PPT Presentation

Transcript of Li Xiong CS573 Data Privacy and Security

Page 1: Li Xiong CS573 Data Privacy and Security

Li Xiong

CS573 Data Privacy and Security

Healthcare privacy and security:

Genomic data privacy

Page 2: Li Xiong CS573 Data Privacy and Security

Genomic data privacy

Genomic data are increasingly collected, stored, and shared in research and clinical environments

Genomic data are person-specific (there exists no public registrar that maps genomes to names of individuals)

Genomic data is not specified as an identifying patient attribute under HIPAA privacy rule and may be released for public research purposes

How can person-specific DNA be shared, such that it cannot be associated to its explicit identity?

Page 3: Li Xiong CS573 Data Privacy and Security

Data sharing scenario

John Smith admitted to a local hospital which stores clinical and DNA information

John visits other hospitals The hospital forward certain DNA data onto a

research group, with institution and pseudonyms of the patients

The hospital sends identified discharge record onto a state-controlled database

Page 4: Li Xiong CS573 Data Privacy and Security

Data at a specific location

Identified table of patient demographics De-identified DNA sequences

Can we uniquely link identified data to DNA data?

Page 5: Li Xiong CS573 Data Privacy and Security

Data at multiple locations

Each site has an identified table and de-identified DNA sequences

Can we uniquely link identified data to DNA data?

Page 6: Li Xiong CS573 Data Privacy and Security

Trails

The set of locations each patient visited is called a trail

The trails can be tracked and matched to link DNA data to identified data

Page 7: Li Xiong CS573 Data Privacy and Security

REIDIT-Complete

Re-identification of data in trails (REIDIT) for complete publishing

If there is a unique trail match, then a re-identification occurred

Page 8: Li Xiong CS573 Data Privacy and Security

Results

Page 9: Li Xiong CS573 Data Privacy and Security

REIDIT-C reidentification

Re-identifiability related to average # people per location

Page 10: Li Xiong CS573 Data Privacy and Security

Reserved publishing Data releasers can reserve certain information N is reserved to P vs. P is reserved to N

Page 11: Li Xiong CS573 Data Privacy and Security

REIDIT - Incomplete

REIDIT for reserved publishing

For each trail in the track with incomplete trails, if there is only one supertrail, then a re-identification occurred

Remove the re-identified supertrail Important because a trail can be a supertrail to

many trails Repeat the process

Page 12: Li Xiong CS573 Data Privacy and Security

REIDIT-Incomplete

0.0, 0.1, 0.5, 0.9: probability of reserving information; hospital rank based on # of patients

Page 13: Li Xiong CS573 Data Privacy and Security

Can masking location help?

Not necessarily!

Page 14: Li Xiong CS573 Data Privacy and Security

Comments and open issues

Can k-anonymity solve the problem? Pseudonyms subject to dictionary attacks,

how to allow linkage of the data without pseudonyms

Genomic protection methods incorporating utility of the genomic data

Page 15: Li Xiong CS573 Data Privacy and Security
Page 16: Li Xiong CS573 Data Privacy and Security

De-identification

e.g. Utah Resource for Genetic and Epidemiologic Research (RGE)