Thesis Proposal Piwowar Presentation 20091109
-
Upload
heather-piwowar -
Category
Technology
-
view
2.474 -
download
0
description
Transcript of Thesis Proposal Piwowar Presentation 20091109
![Page 1: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/1.jpg)
Foundational studies for measuring the impact, prevalence, and patterns
of publicly sharing biomedical research data
Heather Piwowar
Department of Biomedical InformaticsUniversity of Pittsburgh
![Page 2: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/2.jpg)
Sharing research data
http://upload.wikimedia.org/wikipedia/commons/7/76/PeptideMSMS.jpg; http://en.wikipedia.org/wiki/Image:Helices.png; http://en.wikipedia.org/wiki/Image:Heatmap.png; http://en.wikipedia.org/wiki/Image:Microarray2.gif; http://zellig.cpmc.columbia.edu/medlee/demo/; htp://www.plosone.org/article/fetchArticle.action?articleURI=info:doi/10.1371/journal.pone.0000441
![Page 3: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/3.jpg)
Sharing research data
http://upload.wikimedia.org/wikipedia/commons/7/76/PeptideMSMS.jpg; http://en.wikipedia.org/wiki/Image:Helices.png; http://en.wikipedia.org/wiki/Image:Heatmap.png; http://en.wikipedia.org/wiki/Image:Microarray2.gif; http://zellig.cpmc.columbia.edu/medlee/demo/; htp://www.plosone.org/article/fetchArticle.action?articleURI=info:doi/10.1371/journal.pone.0000441
![Page 4: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/4.jpg)
Sharing research data
PAST MEDICAL HISTORY:
Past medical history showed she had
superficial phlebitis times two in the past, had non-insulin dependent diabetes mellitus for
four years.
She had been hypothyroid for three years.
HISTORY OF PRESENT ILLNESS:
The patient is a 58-year-old female, …
http://upload.wikimedia.org/wikipedia/commons/7/76/PeptideMSMS.jpg; http://en.wikipedia.org/wiki/Image:Helices.png; http://en.wikipedia.org/wiki/Image:Heatmap.png; http://en.wikipedia.org/wiki/Image:Microarray2.gif; http://zellig.cpmc.columbia.edu/medlee/demo/; htp://www.plosone.org/article/fetchArticle.action?articleURI=info:doi/10.1371/journal.pone.0000441
![Page 5: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/5.jpg)
Sharing research data
PAST MEDICAL HISTORY:
Past medical history showed she had
superficial phlebitis times two in the past, had non-insulin dependent diabetes mellitus for
four years.
She had been hypothyroid for three years.
HISTORY OF PRESENT ILLNESS:
The patient is a 58-year-old female, …
http://upload.wikimedia.org/wikipedia/commons/7/76/PeptideMSMS.jpg; http://en.wikipedia.org/wiki/Image:Helices.png; http://en.wikipedia.org/wiki/Image:Heatmap.png; http://en.wikipedia.org/wiki/Image:Microarray2.gif; http://zellig.cpmc.columbia.edu/medlee/demo/; htp://www.plosone.org/article/fetchArticle.action?articleURI=info:doi/10.1371/journal.pone.0000441
![Page 6: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/6.jpg)
Sharing research data
PAST MEDICAL HISTORY:
Past medical history showed she had
superficial phlebitis times two in the past, had non-insulin dependent diabetes mellitus for
four years.
She had been hypothyroid for three years.
HISTORY OF PRESENT ILLNESS:
The patient is a 58-year-old female, …
http://upload.wikimedia.org/wikipedia/commons/7/76/PeptideMSMS.jpg; http://en.wikipedia.org/wiki/Image:Helices.png; http://en.wikipedia.org/wiki/Image:Heatmap.png; http://en.wikipedia.org/wiki/Image:Microarray2.gif; http://zellig.cpmc.columbia.edu/medlee/demo/; htp://www.plosone.org/article/fetchArticle.action?articleURI=info:doi/10.1371/journal.pone.0000441
![Page 7: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/7.jpg)
Sharing research data
PAST MEDICAL HISTORY:
Past medical history showed she had
superficial phlebitis times two in the past, had non-insulin dependent diabetes mellitus for
four years.
She had been hypothyroid for three years.
HISTORY OF PRESENT ILLNESS:
The patient is a 58-year-old female, …
http://upload.wikimedia.org/wikipedia/commons/7/76/PeptideMSMS.jpg; http://en.wikipedia.org/wiki/Image:Helices.png; http://en.wikipedia.org/wiki/Image:Heatmap.png; http://en.wikipedia.org/wiki/Image:Microarray2.gif; http://zellig.cpmc.columbia.edu/medlee/demo/; htp://www.plosone.org/article/fetchArticle.action?articleURI=info:doi/10.1371/journal.pone.0000441
![Page 8: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/8.jpg)
http://www.flickr.com/photos/75166820@N00/5318468/
![Page 9: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/9.jpg)
Shared data benefits science
VerifyUnderstandExtendExploreCombineSynergizeTrainReduce
![Page 10: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/10.jpg)
But... costly for authorsFindOrganizeDocumentDeidentifyFormatDecideAskSubmit
Answer questionsWorry about mistakes being foundWorry about data being misinterpretedWorry about being scoopedForgo money and IP and prestige???
![Page 11: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/11.jpg)
As a result, policy makers have spent lots of time and money ....
http://www.flickr.com/photos/tonivc/2283676770/
http://www.flickr.com/photos/johnnyvulkan/381941233/
![Page 12: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/12.jpg)
... on initiatives, requests, requirements, and tools
Funder data sharing requirements
Journal requirements and requests
Databases
Data sharing collaboration grids
Standards
Editorials, letters to the editor, discussion....
![Page 13: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/13.jpg)
http://www.flickr.com/photos/mesh/14102209/
![Page 14: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/14.jpg)
lots of data sharing!
http://www.genome.jp/en/db_growth.html
![Page 15: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/15.jpg)
but how much isn’t shared?
what isn’t shared?
who isn’t sharing it?why not?
what can we do about it?
how much does it matter?
![Page 16: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/16.jpg)
you can not manage what you do not measure
http://www.flickr.com/photos/archeon/2941655917/
![Page 17: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/17.jpg)
http://www.flickr.com/photos/archeon/2941655917/
![Page 18: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/18.jpg)
Related research
Data usually collected via surveys and/or manual audits
http://www.flickr.com/photos/jima/606588905/
![Page 19: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/19.jpg)
Models of data and knowledge sharing
![Page 20: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/20.jpg)
Andriessen. Conditions for the willingness to share knowledge, 2006.
![Page 21: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/21.jpg)
Harder. SMG WP 6/2008 .
![Page 22: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/22.jpg)
![Page 23: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/23.jpg)
Cabrera and Cabrera. Int J of HR Mgmt. 2005.
![Page 24: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/24.jpg)
![Page 25: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/25.jpg)
Kuo. JASIST. 2008.
![Page 26: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/26.jpg)
Limitations of the related research
• manual audits: small sample sizes
• surveys: few variables + self-reporting bias
• not much focus on measuring demonstrated behavior
• not much focus on rewards
• not much focus on policy
• not much focus on biomedical data other than DNA sequences
![Page 27: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/27.jpg)
Needed:
a study of data sharing behaviour and impact
that includes
• a measurement of demonstrated behavior• policy variables • estimate of rewards• a broad and deep selection of data creation instances
![Page 28: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/28.jpg)
Aim 1: Does sharing have benefit for those who share?
Aim 2: Can sharing and withholding be systematically measured?
Aim 3: How often is data shared? What predicts sharing? How can we model sharing behavior?
![Page 29: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/29.jpg)
Scope of proposed study
studiesPublished studies with English full text available in a centralized portal
variables for examinationextracted from Medline and other sources
![Page 30: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/30.jpg)
http://en.wikipedia.org/wiki/DNA_microarray
http://en.wikipedia.org/wiki/Image:Heatmap.png
http://commons.wikimedia.org/wiki/File:DNA_double_helix_vertikal.PNG
Microarray data
![Page 31: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/31.jpg)
http://farm3.static.flickr.com/2146/2389590651_9bbcc9d07e.jpg
![Page 32: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/32.jpg)
Aim 1
![Page 33: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/33.jpg)
Aim 1: Does sharing have benefit for those who share?
http://www.flickr.com/photos/sunrise/35819369/
![Page 34: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/34.jpg)
Aim 1: Does sharing have benefit for those who share?
Benefit of value: Citations.
![Page 35: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/35.jpg)
Aim 1: Does sharing have benefit for those who share?dataset85 cancer microarray trials published in 1999-2003, as identified by Ntzani and Ioannidis (2003)
citationsISI Web of Science Citation index, citations from 2004-2005
data sharing locationsPublisher and lab websites, microarray databases, WayBack Internet Archive, Oncomine
statisticsMultivariate linear regression
![Page 36: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/36.jpg)
Aim 1: Does sharing have benefit for those who share?
![Page 37: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/37.jpg)
Aim 1: Does sharing have benefit for those who share?
Note the logarithmic scale
![Page 38: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/38.jpg)
Aim 1: Does sharing have benefit for those who share?
In multivariate regression, we found studies that had made their data publicly available received 69% more citations than similar studies that did not share their data (95% confidence interval: 18% to 143%)
Piwowar, Day and Fridsma (2007) Sharing Detailed Research Data Is Associated with Increased Citation Rate. PLoS ONE 2(3): e308
![Page 39: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/39.jpg)
Aim 1 conclusion: data sharing has a benefit for sharers
![Page 40: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/40.jpg)
Next: What factors predict sharing?
http://www.flickr.com/photos/ryanr/142455033/
![Page 41: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/41.jpg)
Next: What factors predict sharing?
http://www.flickr.com/photos/ryanr/142455033/
Can I use the same methods of Aim 1 to choose studies and determine data sharing status?
![Page 42: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/42.jpg)
Next: What factors predict sharing?
http://www.flickr.com/photos/ryanr/142455033/
Can I use the same methods of Aim 1 to choose studies and determine data sharing status?
No, those methods donʼt scale to identify or classify enough datapoints.
![Page 43: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/43.jpg)
Aim 2
![Page 44: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/44.jpg)
Need automated methods to:
Identify studies that generate datasets that could potentially be shared (Aim 2a)
Determine which of these have in fact been shared (Aim 2b)
![Page 45: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/45.jpg)
Aim 2a: Identify studies that create gene expression microarray data
http://www.flickr.com/photos/lofaesofa/248546821/
![Page 46: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/46.jpg)
Aim 2a: Identify studies that create gene expression microarray data
Easy, via MeSH indexing terms?
gene expression profiling and/ormicroarray analysis
Unfortunately, has neither high recall nor precision.
![Page 47: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/47.jpg)
Aim 2a: Identify studies that create gene expression microarray dataInstead, look for wetlab methods in full text:
http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1522022&tool=pmcentrezhttp://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1590031&tool=pmcentrez
http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1482311&tool=pmcentrez#id331936http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2082469&tool=pmcentrez
http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=126870&tool=pmcentrez#id442745
![Page 48: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/48.jpg)
Aim 2a: Identify studies that create gene expression microarray data
And query the full text through full-text query portals:
![Page 49: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/49.jpg)
Aim 2a: Identify studies that create gene expression microarray data
query developmentUse supervised natural language processing techniques on a corpus of Open Access articles
query evaluation400 studies that created gene expression microarray data, as identified by Ochsner et al (2008)
goal>90% precision, and sufficient recall to retrieve >1250 articles
![Page 50: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/50.jpg)
Aim 2b
![Page 51: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/51.jpg)
Aim 2b: Identify studies that share their expression microarray data
http://www.flickr.com/photos/dcassaa/422261773/
![Page 52: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/52.jpg)
Aim 2b: Identify studies that share their expression microarray data
![Page 53: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/53.jpg)
Aim 2b: Identify studies that share their expression microarray data
![Page 54: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/54.jpg)
Aim 2b: Identify studies that share their expression microarray data
pmc_gds[filter]
+ text processing on ArrayExpress website
Enough? Unbiased?
![Page 55: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/55.jpg)
Aim 2b: Identify studies that share their expression microarray data
reference standard200 the 400 studies that created gene expression microarray data have shared their microarray data, as identified by Ochsner et al (2008)
goalEstablish that filter has >70% recall with an unbiased representation of MeSH terms, dataset size, and dataset species
![Page 56: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/56.jpg)
Aim 3
![Page 57: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/57.jpg)
Aim 3 – How often is data shared? What predicts sharing? How can we model sharing behavior?
http://www.flickr.com/photos/ryanr/142455033/
![Page 58: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/58.jpg)
Aim 3a: Prevalence of data sharing
![Page 59: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/59.jpg)
Aim 3a: Prevalence of data sharing
PubMed ID
PortalCreated data?
234345456567678789890901
PMC YesHighPr YesScirus YesPMC YesPMC YesHighPr NoPMC No‐ ?
![Page 60: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/60.jpg)
Aim 3a: Prevalence of data sharing
PubMed ID
PortalCreated data?
234345456567678789890901
PMC YesHighPr YesScirus YesPMC YesPMC YesHighPr NoPMC No‐ ?
![Page 61: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/61.jpg)
Aim 3a: Prevalence of data sharing
PubMed ID
PortalCreated data?
234345456567678
PMC YesHighPr YesScirus YesPMC YesPMC Yes
![Page 62: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/62.jpg)
Aim 3a: Prevalence of data sharing
PubMed ID
PortalCreated data?
Shared data?
234345456567678
PMC Yes YesHighPr Yes YesScirus Yes YesPMC Yes NOPMC Yes NO
![Page 63: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/63.jpg)
Aim 3a: Prevalence of data sharing
PubMed ID
PortalCreated data?
Shared data?
234345456567678
PMC Yes YesHighPr Yes YesScirus Yes YesPMC Yes NOPMC Yes NO
Prevalence = Number with Shared dataNumber with Created data
![Page 64: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/64.jpg)
Aim 3b: Correlates with data sharing
![Page 65: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/65.jpg)
Aim 3b: Correlates with data sharing
PubMed ID
PortalCreated data?
Shared data?
234345456567678
PMC Yes YesHighPr Yes YesScirus Yes YesPMC Yes NOPMC Yes NO
Covariates
![Page 66: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/66.jpg)
Aim 3b: Correlates with data sharing
Features to include:• Does the journal have a data sharing policy?• Is the study funded by the NIH?• Is it subject tot the NIH data sharing plan
requirement?• Number of authors• Journal impact factor• Are the experimental samples from humans?• Disease of study• Year of publication• …
![Page 67: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/67.jpg)
Aim 3b: Correlates with data sharing
PubMed ID
PortalCreated data?
Shared data?
Journal policy
NIH funds?
# authors
...
234345456567678
PMC Yes Yes strong yes 2HighPr Yes Yes weak yes 5Scirus Yes Yes weak no 6PMC Yes NO strong yes 5PMC Yes NO strong no 2
Covariates
![Page 68: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/68.jpg)
Aim 3b: Correlates with data sharing
Univariate odds ratiosMultivariate logistic regression
![Page 69: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/69.jpg)
Aim 3b: Correlates with data sharing
PubMed ID
PortalCreated data?
Shared data?
Journal policy
NIH funds?
# authors
...
234345456567678
PMC Yes Yes strong yes 2HighPr Yes Yes weak yes 5Scirus Yes Yes weak no 6PMC Yes NO strong yes 5PMC Yes NO strong no 2
Covariates
Shared data?
Journal policy? NIH funded? # authors ...
![Page 70: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/70.jpg)
Aim 3c: Model of data sharing
![Page 71: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/71.jpg)
Aim 3c: Model of data sharing
PubMed ID
PortalCreated data?
Shared data?
Journal policy
NIH funds?
# authors
...
234345456567678
PMC Yes Yes strong yes 2HighPr Yes Yes weak yes 5Scirus Yes Yes weak no 6PMC Yes NO strong yes 5PMC Yes NO strong no 2
Covariates
![Page 72: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/72.jpg)
Aim 3c: Model of data sharing
Exploratory factor analysis
![Page 73: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/73.jpg)
Aim 3c: Model of data sharing
PubMed ID
PortalCreated data?
Shared data?
Journal policy
NIH funds?
# authors
...
234345456567678
PMC Yes Yes strong yes 2HighPr Yes Yes weak yes 5Scirus Yes Yes weak no 6PMC Yes NO strong yes 5PMC Yes NO strong no 2
Covariates
Shared data?
Mandates Amount of Collaboration
...
![Page 74: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/74.jpg)
Aim 3c: Model of data sharing
PubMed ID
PortalCreated data?
Shared data?
Journal policy
NIH funds?
# authors
...
234345456567678
PMC Yes Yes strong yes 2HighPr Yes Yes weak yes 5Scirus Yes Yes weak no 6PMC Yes NO strong yes 5PMC Yes NO strong no 2
Covariates
Shared data?
Mandates Amount of Collaboration
...StrongWeak
![Page 75: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/75.jpg)
http://www.flickr.com/photos/donjuanna/322798429/
![Page 76: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/76.jpg)
Limitations• Association does not imply causation
• Important influences will be missed due to focus on measurable variables
• Some derived variables involve many estimates and assumptions
• Only considering public sharing in primary centralized databases
• Only one datatype
• Only research studies made available in full-text portals
![Page 77: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/77.jpg)
Risks and contingency plans
NLP performance may be inadequatesupplement with manual annotating via Mechanical Turk
Author ambiguity may introduce extreme outliersuse Author-ity (Smalheiser and Torvik, 2005) for name
disambiguation
Unable to derive a robust exploratory factor modeltry other clustering techniques
Several variables may be unexpectedly difficult to extract and cross-references
if not essential, defer analysis of that variable
![Page 78: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/78.jpg)
Aim 1: Does sharing have benefit for those who share?
Aim 2: Can sharing and withholding be systematically measured?
Aim 3: How often is data shared? What predicts sharing? How can we model sharing behavior?
pilot completed.
Now: full dataset collection
Current status
![Page 79: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/79.jpg)
Anticipated contributions
• Published assessment of the observed and measured rewards, prevalence, and patterns of gene expression microarray dataset sharing
• Publicly available dataset associating microarray study publications with data sharing status
• Generalizable approach for developing practical, real-world information retrieval using centralized full-text query portals
• Preliminary model of data sharing behaviour based on this large dataset
![Page 80: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/80.jpg)
Future work
• Identify and model data reuse
• Citation analysis of the large cohort
• Supplement with survey responses
http://www.flickr.com/photos/cogdog/123072/
![Page 81: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/81.jpg)
I post my data, code, and statistical scripts athttp://www.dbmi.pitt.edu/piwowar
Share yours too!
http://www.flickr.com/photos/myklroventine/892446624/
Data sharing plan
![Page 82: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/82.jpg)
Thanks to: ➡ the NLM for funding training grant 5 T15 LM007059-22 ➡ the Dept of Biomedical Informatics at the U of Pittsburgh➡ my committee
Dr Wendy Chapman Biomed InformaticsDr Ellen Detlefsen iSchoolDr Madhavi Ganapathiraju BioinformaticsDr Brian Butler Katz School of BusinessDr Gunther Eysenbach U of Toronto, Health Policy
Mgmt and Evaluation
![Page 83: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/83.jpg)
![Page 84: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/84.jpg)
aim
Funder Journal Investigator Institution Study
Is research data shared after publication?
![Page 85: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/85.jpg)
self-reported denying a request in last 3 years
trainees self-reported denying a request
been denied access to data, materials, code
authors “not able to retrieve raw data”
not willing to release data
0% 10% 20% 30% 40%
Prevalence of data withholding via surveys
Campbell et al. JAMA. 2002.Kyzas et al. J Natl Cancer Inst. 2005.
Vogeli et al. Acad Med. 2006.Reidpath et al. Bioethics 2001.
![Page 86: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/86.jpg)
Campbell et al. JAMA 2002.
sharing is too much effort
want student or jr faculty to publish more
they themselves want to publish more
cost
industrial sponsor
confidentiality
commercial value of results0% 20% 40% 60% 80%
Self‐reported reasons for data withholding
![Page 87: Thesis Proposal Piwowar Presentation 20091109](https://reader034.fdocuments.net/reader034/viewer/2022042813/5455ba83b1af9fc0638b4a64/html5/thumbnails/87.jpg)
Blumenthal et al. Acad Med. 2006
industry involvement
perceived competitiveness of field
male
sharing discouraged in training
human participants
academic productivity
0 1 2 3
Correlates with self‐reported data withholding