Professor John Bacon-Shone Director, Social Sciences Research Centre &

13
Professor John Bacon-Shone Director, Social Sciences Research Centre & Chair, Human Research Ethics Committee The University of Hong Kong Re-identification and Privacy risk Asian Privacy Scholars Network: July 2013

description

Professor John Bacon-Shone Director, Social Sciences Research Centre & Chair, Human Research Ethics Committee The University of Hong Kong. Re-identification and Privacy risk. Introduction. - PowerPoint PPT Presentation

Transcript of Professor John Bacon-Shone Director, Social Sciences Research Centre &

Page 1: Professor John Bacon-Shone Director, Social Sciences Research Centre &

Professor John Bacon-ShoneDirector, Social Sciences Research

Centre & Chair, Human Research Ethics

CommitteeThe University of Hong Kong

Re-identification and Privacy risk

Asian Privacy Scholars Network: July 2013

Page 2: Professor John Bacon-Shone Director, Social Sciences Research Centre &

Asian Privacy Scholars Network: July 2013

Introduction

Ethics committees in universities generally assume that once personal data has been anonymized, it is no longer personal data, so the privacy risk is permanently addressed

Recent papers suggest that this is not necessarily a wise assumption!

I wish to examine the issue of re-identification and what it means for privacy, confidentiality and research ethics

Page 3: Professor John Bacon-Shone Director, Social Sciences Research Centre &

Asian Privacy Scholars Network: July 2013

AnonymitySeems the most difficult ethical concept for academics to fully grasp.

The dictionary says: Anonymous: not named or identified But most people think it just means not named, so for example, if I interview you, but do not record your name, they think it is anonymous, even if I know who you are or you make statements in the interview record that implicitly identify you

What is much more tricky is that anonymity may not be static: being anonymous today does not necessarily mean being anonymous tomorrow

Page 4: Professor John Bacon-Shone Director, Social Sciences Research Centre &

Asian Privacy Scholars Network: July 2013

Personal Identifier (PDPO)The ordinance states that:

“Personal Identifier” means an identifier that is assigned to an individual by a data user for the purpose of the operations of the user and that uniquely identifies that individual in relation to the data user, but does not include an individual's name used to identify that individual

While the assignment of a “personal identifier” may provide a certain degree of anonymity, its effectiveness relies on the data user taking the necessary action. For example, if a hospital uses the patient’s ID card number to identify the patient, the desired degree of anonymity will not be attained.

Page 5: Professor John Bacon-Shone Director, Social Sciences Research Centre &

Asian Privacy Scholars Network: July 2013

Personal Identifier (my version)Personal Identifier means an identifier, other than name, that uniquely identifies some (but maybe not all) individuals in a specified population

Clearly, the existence of a personal identifier does not mean we have anonymity for all individuals. Some privacy risk therefore exists. The evaluation of such privacy risk requires knowing both the chance of re-identification of individuals and the consequences.

Next, let’s examine the chance of re-identification

Page 6: Professor John Bacon-Shone Director, Social Sciences Research Centre &

Asian Privacy Scholars Network: July 2013

Chance of re-identificationThis can be separated into 2 elements:

1) Chance of uniqueness2) Ease of matching

Page 7: Professor John Bacon-Shone Director, Social Sciences Research Centre &

Asian Privacy Scholars Network: July 2013

Chance of uniquenessThe chance of uniqueness depends on both the identifier and the population

The more variables in the dataset and the more possible values for each variable, the more likely that the identifier is unique for some individuals. Hence the concern about Big Data and the development of much larger datasets.

A smaller population (e.g. identical twins in Hong Kong) has a much greater chance of uniqueness than a large population.

Note that DNA profile may not be unique (identical twins) and the matching can be indirect using the similarities of DNA within families.

Page 8: Professor John Bacon-Shone Director, Social Sciences Research Centre &

Asian Privacy Scholars Network: July 2013

Ease of matchingThe ease of matching means how easily can we match the identifier back to a specific person. Let us consider some examples:

ID card number: here the risk of matching is high, because the government has enabled leakage of matching information (e.g. Company Registry)

DNA profile: the risk of matching should be low, unless you or family members have provided DNA profiles to a registry (see later discussion)

Date and time of admission to a specific hospital: would allow matching with hospital records, if they can be accessed

Page 9: Professor John Bacon-Shone Director, Social Sciences Research Centre &

Asian Privacy Scholars Network: July 2013

Ease of matchingRecent publications have discussed the possibility of matching becoming easier with time, for example:

Data leakage:

Individuals make DNA profiles public, making it increasingly possible to use familial matches to match individuals or surnames

Arrested individuals are often required to provide DNA profiles that are not erased even if innocent

ID card numbers are leaked from websites, making it even easier to match to names.

Page 10: Professor John Bacon-Shone Director, Social Sciences Research Centre &

Asian Privacy Scholars Network: July 2013

Ease of matchingLinkage of the identifier to individual characteristics:

Hospital admission:

If it is known that you were involved in a traffic accident, your hospital admission soon afterward near to the accident location becomes likely, increasing the ease of matching to hospital records.

DNA:

Researchers are developing methods to predict personal characteristics from DNA profiles, such as eye, skin and hair colour, so ease is likely to increase

Page 11: Professor John Bacon-Shone Director, Social Sciences Research Centre &

Asian Privacy Scholars Network: July 2013

Value of matchingNeed to consider the reason of matching:

Authentication – need to be able to match against an identifier carried by the individual such as ID card

Matching other records – need only to match internally, so no need to use an identifier usable externally, greatly reducing the risk of unintended matching

Page 12: Professor John Bacon-Shone Director, Social Sciences Research Centre &

Asian Privacy Scholars Network: July 2013

Consequences of re-identificationCan range from the trivial (e.g. customer of a clothing retail outlet) to the serious (e.g. HIV status), but the full consequences cannot always be predicted

While it is possible to change some identifiers (e.g. ID card number, mobile phone number), it is impossible to change other identifiers (e.g. DNA profile), so long term risk needs to be recognized and addressed

Page 13: Professor John Bacon-Shone Director, Social Sciences Research Centre &

Asian Privacy Scholars Network: July 2013

Implications of re-identification

Arguably unethical to promise: Zero risk – mistakes can always happen

Future risk is same as current risk – technology and circumstances change; ease of matching continues to increase, especially for biological markers

Need to review use of identifiers – what seemed privacy safe in the past may not be safe in the future, so need to continue to review privacy risk