JCDL doctoral consortium 2008: Proposed Foundations for Evaluating Data Sharing and Reuse in the...
-
Upload
heather-piwowar -
Category
Health & Medicine
-
view
1.694 -
download
0
description
Transcript of JCDL doctoral consortium 2008: Proposed Foundations for Evaluating Data Sharing and Reuse in the...
![Page 1: JCDL doctoral consortium 2008: Proposed Foundations for Evaluating Data Sharing and Reuse in the Biomedical Literature](https://reader033.fdocuments.net/reader033/viewer/2022051513/5455ba0faf79590b088b4a75/html5/thumbnails/1.jpg)
Prevalence and Patterns of Biomedical Research Data
Sharing and Reuse
Heather Piwowar Department of Biomedical Informatics
University of Pittsburgh
JCDL Doctoral Consortium June 2008
except clipart
![Page 2: JCDL doctoral consortium 2008: Proposed Foundations for Evaluating Data Sharing and Reuse in the Biomedical Literature](https://reader033.fdocuments.net/reader033/viewer/2022051513/5455ba0faf79590b088b4a75/html5/thumbnails/2.jpg)
![Page 3: JCDL doctoral consortium 2008: Proposed Foundations for Evaluating Data Sharing and Reuse in the Biomedical Literature](https://reader033.fdocuments.net/reader033/viewer/2022051513/5455ba0faf79590b088b4a75/html5/thumbnails/3.jpg)
![Page 4: JCDL doctoral consortium 2008: Proposed Foundations for Evaluating Data Sharing and Reuse in the Biomedical Literature](https://reader033.fdocuments.net/reader033/viewer/2022051513/5455ba0faf79590b088b4a75/html5/thumbnails/4.jpg)
![Page 5: JCDL doctoral consortium 2008: Proposed Foundations for Evaluating Data Sharing and Reuse in the Biomedical Literature](https://reader033.fdocuments.net/reader033/viewer/2022051513/5455ba0faf79590b088b4a75/html5/thumbnails/5.jpg)
![Page 6: JCDL doctoral consortium 2008: Proposed Foundations for Evaluating Data Sharing and Reuse in the Biomedical Literature](https://reader033.fdocuments.net/reader033/viewer/2022051513/5455ba0faf79590b088b4a75/html5/thumbnails/6.jpg)
![Page 7: JCDL doctoral consortium 2008: Proposed Foundations for Evaluating Data Sharing and Reuse in the Biomedical Literature](https://reader033.fdocuments.net/reader033/viewer/2022051513/5455ba0faf79590b088b4a75/html5/thumbnails/7.jpg)
$$
![Page 8: JCDL doctoral consortium 2008: Proposed Foundations for Evaluating Data Sharing and Reuse in the Biomedical Literature](https://reader033.fdocuments.net/reader033/viewer/2022051513/5455ba0faf79590b088b4a75/html5/thumbnails/8.jpg)
![Page 9: JCDL doctoral consortium 2008: Proposed Foundations for Evaluating Data Sharing and Reuse in the Biomedical Literature](https://reader033.fdocuments.net/reader033/viewer/2022051513/5455ba0faf79590b088b4a75/html5/thumbnails/9.jpg)
![Page 10: JCDL doctoral consortium 2008: Proposed Foundations for Evaluating Data Sharing and Reuse in the Biomedical Literature](https://reader033.fdocuments.net/reader033/viewer/2022051513/5455ba0faf79590b088b4a75/html5/thumbnails/10.jpg)
?
![Page 11: JCDL doctoral consortium 2008: Proposed Foundations for Evaluating Data Sharing and Reuse in the Biomedical Literature](https://reader033.fdocuments.net/reader033/viewer/2022051513/5455ba0faf79590b088b4a75/html5/thumbnails/11.jpg)
Is it working? Is it worth it?
Are scientists sharing their data?
Are other scientists reusing the data?
Who, if anyone, is benefiting from the policies, tools, and initiatives?
![Page 12: JCDL doctoral consortium 2008: Proposed Foundations for Evaluating Data Sharing and Reuse in the Biomedical Literature](https://reader033.fdocuments.net/reader033/viewer/2022051513/5455ba0faf79590b088b4a75/html5/thumbnails/12.jpg)
We cannot manage what we do not measure
![Page 13: JCDL doctoral consortium 2008: Proposed Foundations for Evaluating Data Sharing and Reuse in the Biomedical Literature](https://reader033.fdocuments.net/reader033/viewer/2022051513/5455ba0faf79590b088b4a75/html5/thumbnails/13.jpg)
Dissertation Objective:
Evaluate the patterns and prevalence of biomedical research data sharing and reuse
![Page 14: JCDL doctoral consortium 2008: Proposed Foundations for Evaluating Data Sharing and Reuse in the Biomedical Literature](https://reader033.fdocuments.net/reader033/viewer/2022051513/5455ba0faf79590b088b4a75/html5/thumbnails/14.jpg)
Prior work in this area
• Surveys • Manual audits • Automated classification of citation
contexts
See http://www.citeulike.org/user/hpiwowar for bibliography
![Page 15: JCDL doctoral consortium 2008: Proposed Foundations for Evaluating Data Sharing and Reuse in the Biomedical Literature](https://reader033.fdocuments.net/reader033/viewer/2022051513/5455ba0faf79590b088b4a75/html5/thumbnails/15.jpg)
Missing: a study of data sharing and reuse
behavior and impact based on a broad spectrum of instances
![Page 16: JCDL doctoral consortium 2008: Proposed Foundations for Evaluating Data Sharing and Reuse in the Biomedical Literature](https://reader033.fdocuments.net/reader033/viewer/2022051513/5455ba0faf79590b088b4a75/html5/thumbnails/16.jpg)
Dissertation Research Questions
1. prevalence of sharing and reuse 2. patterns of sharing and reuse 3. affect of sharing and reuse on impact 4. implications of findings
![Page 17: JCDL doctoral consortium 2008: Proposed Foundations for Evaluating Data Sharing and Reuse in the Biomedical Literature](https://reader033.fdocuments.net/reader033/viewer/2022051513/5455ba0faf79590b088b4a75/html5/thumbnails/17.jpg)
![Page 18: JCDL doctoral consortium 2008: Proposed Foundations for Evaluating Data Sharing and Reuse in the Biomedical Literature](https://reader033.fdocuments.net/reader033/viewer/2022051513/5455ba0faf79590b088b4a75/html5/thumbnails/18.jpg)
Data type
• Gene expression microarrays
http://en.wikipedia.org/wiki/DNA_microarray http://en.wikipedia.org/wiki/Image:Heatmap.png
![Page 19: JCDL doctoral consortium 2008: Proposed Foundations for Evaluating Data Sharing and Reuse in the Biomedical Literature](https://reader033.fdocuments.net/reader033/viewer/2022051513/5455ba0faf79590b088b4a75/html5/thumbnails/19.jpg)
Sharing type
• Openly online, mentioned in publication – PubMed – filtered with MeSH terms for gene-expression – English, machine-readable full text – 2000-2007
![Page 20: JCDL doctoral consortium 2008: Proposed Foundations for Evaluating Data Sharing and Reuse in the Biomedical Literature](https://reader033.fdocuments.net/reader033/viewer/2022051513/5455ba0faf79590b088b4a75/html5/thumbnails/20.jpg)
Dissertation dataset
Article ID Link to full text
234 http://… 456 http://…
657 http://…
897 http://…
![Page 21: JCDL doctoral consortium 2008: Proposed Foundations for Evaluating Data Sharing and Reuse in the Biomedical Literature](https://reader033.fdocuments.net/reader033/viewer/2022051513/5455ba0faf79590b088b4a75/html5/thumbnails/21.jpg)
1. What is the prevalence of biomedical research data sharing? of biomedical research data reuse?
![Page 22: JCDL doctoral consortium 2008: Proposed Foundations for Evaluating Data Sharing and Reuse in the Biomedical Literature](https://reader033.fdocuments.net/reader033/viewer/2022051513/5455ba0faf79590b088b4a75/html5/thumbnails/22.jpg)
Endpoints
• Three endpoints: – Does this study produce raw data? – If so, does this study share the raw data? – Does this study reuse others’ raw data?
![Page 23: JCDL doctoral consortium 2008: Proposed Foundations for Evaluating Data Sharing and Reuse in the Biomedical Literature](https://reader033.fdocuments.net/reader033/viewer/2022051513/5455ba0faf79590b088b4a75/html5/thumbnails/23.jpg)
Example text cues
• Sharing – “our data has been deposited in the GEO
database” – “the microarray expression values from this
study are available at the following website” • Reuse
– “using the data of Smith et al, we… ” – “we downloaded four datasets from…”
![Page 24: JCDL doctoral consortium 2008: Proposed Foundations for Evaluating Data Sharing and Reuse in the Biomedical Literature](https://reader033.fdocuments.net/reader033/viewer/2022051513/5455ba0faf79590b088b4a75/html5/thumbnails/24.jpg)
Identification of endpoints
• Train and evaluate a Natural Language Processing system to recognize endpoint cues within full text
• Performance to be summarized as precision and recall with confidence intervals
Pilot data for NLP identification of sharing: Piwowar and Chapman, submitted to AMIA 2008.
![Page 25: JCDL doctoral consortium 2008: Proposed Foundations for Evaluating Data Sharing and Reuse in the Biomedical Literature](https://reader033.fdocuments.net/reader033/viewer/2022051513/5455ba0faf79590b088b4a75/html5/thumbnails/25.jpg)
Article ID Link to full text
Produces data?
Shares data? Reuses data?
234 http://… TRUE TRUE FALSE 456 http://… TRUE TRUE TRUE 657 http://… TRUE FALSE FALSE 897 http://… FALSE n/a TRUE
| Endpoints |
Research Question 1: Prevalence
Calculate Percentages
![Page 26: JCDL doctoral consortium 2008: Proposed Foundations for Evaluating Data Sharing and Reuse in the Biomedical Literature](https://reader033.fdocuments.net/reader033/viewer/2022051513/5455ba0faf79590b088b4a75/html5/thumbnails/26.jpg)
2. What features are most associated with an investigator’s decision to share or reuse a biomedical
research dataset?
![Page 27: JCDL doctoral consortium 2008: Proposed Foundations for Evaluating Data Sharing and Reuse in the Biomedical Literature](https://reader033.fdocuments.net/reader033/viewer/2022051513/5455ba0faf79590b088b4a75/html5/thumbnails/27.jpg)
• Features to include: – Journal Impact Factor – Number of Authors – Are the samples from humans – Subdiscipline – Strictness of Journal policy on data sharing – Institution – Year of publication – …
Pilot data for journal policies: Piwowar and Chapman, ELPUB 2008.
![Page 28: JCDL doctoral consortium 2008: Proposed Foundations for Evaluating Data Sharing and Reuse in the Biomedical Literature](https://reader033.fdocuments.net/reader033/viewer/2022051513/5455ba0faf79590b088b4a75/html5/thumbnails/28.jpg)
Article ID
Link to full text
Produces data?
Shares data?
Reuses data?
Journal Impact Factor
# Authors
Human Data
234 http://… TRUE TRUE FALSE 1.5 2 TRUE 456 http://… TRUE TRUE TRUE 23.5 1 FALSE 657 http://… TRUE FALSE FALSE 2.4 6 FALSE 897 http://… FALSE n/a TRUE 0.6 2 TRUE
Shares data? Reuses data?
Journal Impact Factor # Authors
Human Data …
| Endpoints | | Covariates |
Multivariate logistic regressions
![Page 29: JCDL doctoral consortium 2008: Proposed Foundations for Evaluating Data Sharing and Reuse in the Biomedical Literature](https://reader033.fdocuments.net/reader033/viewer/2022051513/5455ba0faf79590b088b4a75/html5/thumbnails/29.jpg)
3. Does sharing or reusing data contribute to
the impact of a research article, independently of other factors?
![Page 30: JCDL doctoral consortium 2008: Proposed Foundations for Evaluating Data Sharing and Reuse in the Biomedical Literature](https://reader033.fdocuments.net/reader033/viewer/2022051513/5455ba0faf79590b088b4a75/html5/thumbnails/30.jpg)
Assumption: citation count is a proxy for research impact
![Page 31: JCDL doctoral consortium 2008: Proposed Foundations for Evaluating Data Sharing and Reuse in the Biomedical Literature](https://reader033.fdocuments.net/reader033/viewer/2022051513/5455ba0faf79590b088b4a75/html5/thumbnails/31.jpg)
Article ID
… Produces data?
Shares data?
Reuses data?
… … … Number of Citations
234 TRUE TRUE FALSE 1.5 2 TRUE 0 456 TRUE TRUE TRUE 23.5 1 FALSE 4 657 TRUE FALSE FALSE 2.4 6 FALSE 4 897 FALSE n/a TRUE 0.6 2 TRUE 5
Shares data? Reuses data?
Journal Impact Factor # Authors
Human Data …
Number of Citations
| Endpoints | | Covariates |
Pilot data on citation impact for sharing: Piwowar, Day and Fridsma, PLoS ONE 2007. Multivariate linear regressions
![Page 32: JCDL doctoral consortium 2008: Proposed Foundations for Evaluating Data Sharing and Reuse in the Biomedical Literature](https://reader033.fdocuments.net/reader033/viewer/2022051513/5455ba0faf79590b088b4a75/html5/thumbnails/32.jpg)
4. What do the results suggest for developing efficient, effective policies,
tools, and initiatives for promoting data sharing and reuse?
![Page 33: JCDL doctoral consortium 2008: Proposed Foundations for Evaluating Data Sharing and Reuse in the Biomedical Literature](https://reader033.fdocuments.net/reader033/viewer/2022051513/5455ba0faf79590b088b4a75/html5/thumbnails/33.jpg)
We might discover, for example: • Lots of sharing for non-human data • All reuse within the first 5 years • Journal and funder requests are ineffective
![Page 34: JCDL doctoral consortium 2008: Proposed Foundations for Evaluating Data Sharing and Reuse in the Biomedical Literature](https://reader033.fdocuments.net/reader033/viewer/2022051513/5455ba0faf79590b088b4a75/html5/thumbnails/34.jpg)
Significance
• Dataset – social network analysis, simulation, …
• NLP Classifiers • Best-practice patterns, communities • Novel research connections • Inspire further work in this area
– policy evaluation, reusability metrics – citations for data (Data Reuse Registry)
Piwowar, Chapman. Envisioning a Data Reuse Registry. Poster submitted to AMIA 2008.
![Page 35: JCDL doctoral consortium 2008: Proposed Foundations for Evaluating Data Sharing and Reuse in the Biomedical Literature](https://reader033.fdocuments.net/reader033/viewer/2022051513/5455ba0faf79590b088b4a75/html5/thumbnails/35.jpg)
Limitations
• Causation? • Other data types? • Other sharing mechanisms?
![Page 36: JCDL doctoral consortium 2008: Proposed Foundations for Evaluating Data Sharing and Reuse in the Biomedical Literature](https://reader033.fdocuments.net/reader033/viewer/2022051513/5455ba0faf79590b088b4a75/html5/thumbnails/36.jpg)
“Does anyone want your data?
That’s hard to predict […] After all, no one ever knocked on your door asking to buy those figurines collecting dust in your cabinet before you listed them on eBay.
Your data, too, may simply be awaiting an effective matchmaker.”
Got data? Nature Neuroscience 10, 931 (2007)
![Page 37: JCDL doctoral consortium 2008: Proposed Foundations for Evaluating Data Sharing and Reuse in the Biomedical Literature](https://reader033.fdocuments.net/reader033/viewer/2022051513/5455ba0faf79590b088b4a75/html5/thumbnails/37.jpg)
My data is here
www.dbmi.pitt.edu/piwowar
I urge you to share yours, too.
![Page 38: JCDL doctoral consortium 2008: Proposed Foundations for Evaluating Data Sharing and Reuse in the Biomedical Literature](https://reader033.fdocuments.net/reader033/viewer/2022051513/5455ba0faf79590b088b4a75/html5/thumbnails/38.jpg)
Thank you
Funding: NLM informatics training grant Advisor: Dr. Wendy Chapman Committee: Dr. Ellen Detlefsen Dr. Madhavi Ganapathiraju + JCDL Funding and Reviewers!
Questions, Comments, or Suggestions?
except clipart