Managing sensitive data in your repository
-
Upload
australiannationaldataservice -
Category
Data & Analytics
-
view
171 -
download
0
Transcript of Managing sensitive data in your repository
Managing sensitive data in your repository
Natasha SimonsSharing Health-y and Sensitive Data: Challenges and Solutions Workshop Perth 3 September 2015
What is a data repository?
1
A research data repository is a managed environment capable of
storing and sharing (largely) digital data. The data repository supports the process of curating, preserving, and sharing research
data.
What kinds of data repositories are there?
2
Are repositories for open data only?
3
Yes and no….because it depends on the purpose/scope
Repositories can support data that is:1. Open access only2. Mediated access only3. Closed/private only
Most data repositories are a combination of 1 & 2
Are there health data repositories?
4
Yes, many!
http://www.nlm.nih.gov/NIHbmic/nih_data_sharing_repositories.html
What’s the point of data repositories?
5
Data repositories assist researchers and the research community to:
1. Support data sharing, data discovery & reuse, data preservation
2. Comply with publisher requirements3. Comply with funder requirements4. Comply with institutional or govt policy
requirements5. Support institutional goals Illustration credit: Ainsley Seago. doi:10.1371/journal.pbio.1001779.g001
Can sensitive data be managed in a repository?
6
Yes!
Ask:• Can the raw data be (de-identified and)
made completely open? Or will access be restricted? Mediated?
• What licence should be applied to enable data reuse?
• What metadata elements, links (e.g. to publications) and identifiers (e.g. DOIs, ORCIDs) will aid discovery and reuse of the data? Source: http://www.slideshare.net/WLSA_ORG/wh2014-workshop-health-data-consortium
Can sensitive data be managed in a repository?
7
Also ask:
• Can a citation element be added to support attribution and reuse tracking?
• Who/what will be the method of contact for the data?
• Are there other conditions that the data is subject to e.g. release subject to an embargo period?
Examples of sensitive data in repositories?
8
Examples of sensitive data in repositories?
9
Examples of sensitive data in repositories?
10
Examples of sensitive data in repositories?
11
Examples of sensitive data in repositories?
12
Examples of sensitive data in repositories?
What’s really challenging?
14
“Having longitudinal data on individuals is a part of many observational designs, and is needed for research into outcomes, efficacy and many mechanistic studies. Most repositories thus have longitudinal observations. To build such a database you need some way to link observations on the same identified person. Therefore most repositories contain personally identified data, but, because of privacy concerns, they often release only de-identified data. Difficulties in the de-identification process can cause some data to be omitted in a dataset. A lack of direct identifiers in a data collection or federation could prevent linking of data for some patients.
From: Wade, T. Traits and Types of Health Data Repositories. Health Information Science and Systems 2014, 2:4 doi:10.1186/2047-2501-2-4http://www.hissjournal.com/content/2/1/4
Small group exercise
15
Discovering sensitive health data in repositories
Small group exercise
Acknowledgement
Australian National Data Service is funded by
the Commonwealth under the NCRIS Program
31 August, 2015 16