BIOLINK 2008: Linking database submissions to primary citations with PubMed Central
-
Upload
heather-piwowar -
Category
Health & Medicine
-
view
1.099 -
download
1
description
Transcript of BIOLINK 2008: Linking database submissions to primary citations with PubMed Central
![Page 1: BIOLINK 2008: Linking database submissions to primary citations with PubMed Central](https://reader035.fdocuments.net/reader035/viewer/2022081412/5455ba4caf79590b088b4a76/html5/thumbnails/1.jpg)
Linking Database Submissions to Primary Citationswith PubMed Central
Heather Piwowar and Wendy ChapmanDepartment of Biomedical Informatics
University of Pittsburgh
BioLINK 2008
![Page 2: BIOLINK 2008: Linking database submissions to primary citations with PubMed Central](https://reader035.fdocuments.net/reader035/viewer/2022081412/5455ba4caf79590b088b4a76/html5/thumbnails/2.jpg)
![Page 3: BIOLINK 2008: Linking database submissions to primary citations with PubMed Central](https://reader035.fdocuments.net/reader035/viewer/2022081412/5455ba4caf79590b088b4a76/html5/thumbnails/3.jpg)
![Page 4: BIOLINK 2008: Linking database submissions to primary citations with PubMed Central](https://reader035.fdocuments.net/reader035/viewer/2022081412/5455ba4caf79590b088b4a76/html5/thumbnails/4.jpg)
These links are important for several reasons
![Page 5: BIOLINK 2008: Linking database submissions to primary citations with PubMed Central](https://reader035.fdocuments.net/reader035/viewer/2022081412/5455ba4caf79590b088b4a76/html5/thumbnails/5.jpg)
![Page 6: BIOLINK 2008: Linking database submissions to primary citations with PubMed Central](https://reader035.fdocuments.net/reader035/viewer/2022081412/5455ba4caf79590b088b4a76/html5/thumbnails/6.jpg)
![Page 7: BIOLINK 2008: Linking database submissions to primary citations with PubMed Central](https://reader035.fdocuments.net/reader035/viewer/2022081412/5455ba4caf79590b088b4a76/html5/thumbnails/7.jpg)
![Page 8: BIOLINK 2008: Linking database submissions to primary citations with PubMed Central](https://reader035.fdocuments.net/reader035/viewer/2022081412/5455ba4caf79590b088b4a76/html5/thumbnails/8.jpg)
Sometimes the links are easy to discover
![Page 9: BIOLINK 2008: Linking database submissions to primary citations with PubMed Central](https://reader035.fdocuments.net/reader035/viewer/2022081412/5455ba4caf79590b088b4a76/html5/thumbnails/9.jpg)
![Page 10: BIOLINK 2008: Linking database submissions to primary citations with PubMed Central](https://reader035.fdocuments.net/reader035/viewer/2022081412/5455ba4caf79590b088b4a76/html5/thumbnails/10.jpg)
![Page 11: BIOLINK 2008: Linking database submissions to primary citations with PubMed Central](https://reader035.fdocuments.net/reader035/viewer/2022081412/5455ba4caf79590b088b4a76/html5/thumbnails/11.jpg)
![Page 12: BIOLINK 2008: Linking database submissions to primary citations with PubMed Central](https://reader035.fdocuments.net/reader035/viewer/2022081412/5455ba4caf79590b088b4a76/html5/thumbnails/12.jpg)
![Page 13: BIOLINK 2008: Linking database submissions to primary citations with PubMed Central](https://reader035.fdocuments.net/reader035/viewer/2022081412/5455ba4caf79590b088b4a76/html5/thumbnails/13.jpg)
But the meaning of hyperlinks is ambiguous:
![Page 14: BIOLINK 2008: Linking database submissions to primary citations with PubMed Central](https://reader035.fdocuments.net/reader035/viewer/2022081412/5455ba4caf79590b088b4a76/html5/thumbnails/14.jpg)
And often no hyperlinks at all:
![Page 15: BIOLINK 2008: Linking database submissions to primary citations with PubMed Central](https://reader035.fdocuments.net/reader035/viewer/2022081412/5455ba4caf79590b088b4a76/html5/thumbnails/15.jpg)
One way to identify links:
NLP systems that identify statements of shared data
from within full text.
![Page 16: BIOLINK 2008: Linking database submissions to primary citations with PubMed Central](https://reader035.fdocuments.net/reader035/viewer/2022081412/5455ba4caf79590b088b4a76/html5/thumbnails/16.jpg)
BUT this requires developing and maintaining a full-text archive!
![Page 17: BIOLINK 2008: Linking database submissions to primary citations with PubMed Central](https://reader035.fdocuments.net/reader035/viewer/2022081412/5455ba4caf79590b088b4a76/html5/thumbnails/17.jpg)
What about using PubMed Central?
![Page 18: BIOLINK 2008: Linking database submissions to primary citations with PubMed Central](https://reader035.fdocuments.net/reader035/viewer/2022081412/5455ba4caf79590b088b4a76/html5/thumbnails/18.jpg)
Usage?
• scientists looking for datasets for reuse• curators looking for primary citations• researchers studying data sharing
behaviour
![Page 19: BIOLINK 2008: Linking database submissions to primary citations with PubMed Central](https://reader035.fdocuments.net/reader035/viewer/2022081412/5455ba4caf79590b088b4a76/html5/thumbnails/19.jpg)
Goal:
Use the simple, full-text query interface of PubMed Central
to identify articles with shared datasets
![Page 20: BIOLINK 2008: Linking database submissions to primary citations with PubMed Central](https://reader035.fdocuments.net/reader035/viewer/2022081412/5455ba4caf79590b088b4a76/html5/thumbnails/20.jpg)
Method:
• Gene expression microarray data• GEO database
![Page 21: BIOLINK 2008: Linking database submissions to primary citations with PubMed Central](https://reader035.fdocuments.net/reader035/viewer/2022081412/5455ba4caf79590b088b4a76/html5/thumbnails/21.jpg)
Method:
• Open Access articles to train• Non-Open access articles to test
• Gene-expression articles selected by MeSH term query
![Page 22: BIOLINK 2008: Linking database submissions to primary citations with PubMed Central](https://reader035.fdocuments.net/reader035/viewer/2022081412/5455ba4caf79590b088b4a76/html5/thumbnails/22.jpg)
Gold Standard:
• True positives (N=550)Articles with primary citation links from GEO + screening of full-text
• True negatives (N=165)The rest
![Page 23: BIOLINK 2008: Linking database submissions to primary citations with PubMed Central](https://reader035.fdocuments.net/reader035/viewer/2022081412/5455ba4caf79590b088b4a76/html5/thumbnails/23.jpg)
Building the query:
• Used full-text of open-access cohort• Removed words <40 occurrences• Unigram bag-of-words vectors
• Tree and Rule algorithms, a variety of parameters
![Page 24: BIOLINK 2008: Linking database submissions to primary citations with PubMed Central](https://reader035.fdocuments.net/reader035/viewer/2022081412/5455ba4caf79590b088b4a76/html5/thumbnails/24.jpg)
(geo OR omnibus) AND microarray AND "gene expression" AND accessionNOT (databases OR user OR users OR (public AND accessed) OR (downloaded AND published))
![Page 25: BIOLINK 2008: Linking database submissions to primary citations with PubMed Central](https://reader035.fdocuments.net/reader035/viewer/2022081412/5455ba4caf79590b088b4a76/html5/thumbnails/25.jpg)
(geo OR omnibus) AND microarray AND "gene expression" AND accessionNOT (databases OR user OR users OR (public AND accessed) OR (downloaded AND published))
![Page 26: BIOLINK 2008: Linking database submissions to primary citations with PubMed Central](https://reader035.fdocuments.net/reader035/viewer/2022081412/5455ba4caf79590b088b4a76/html5/thumbnails/26.jpg)
(geo OR omnibus) AND microarray AND "gene expression" AND accessionNOT (databases OR user OR users OR (public AND accessed) OR (downloaded AND published))
![Page 27: BIOLINK 2008: Linking database submissions to primary citations with PubMed Central](https://reader035.fdocuments.net/reader035/viewer/2022081412/5455ba4caf79590b088b4a76/html5/thumbnails/27.jpg)
(geo OR omnibus) AND microarray AND "gene expression" AND accessionNOT (databases OR user OR users OR (public AND accessed) OR (downloaded AND published))
![Page 28: BIOLINK 2008: Linking database submissions to primary citations with PubMed Central](https://reader035.fdocuments.net/reader035/viewer/2022081412/5455ba4caf79590b088b4a76/html5/thumbnails/28.jpg)
(geo OR omnibus) AND microarray AND "gene expression" AND accessionNOT (databases OR user OR users OR (public AND accessed) OR (downloaded AND published))
![Page 29: BIOLINK 2008: Linking database submissions to primary citations with PubMed Central](https://reader035.fdocuments.net/reader035/viewer/2022081412/5455ba4caf79590b088b4a76/html5/thumbnails/29.jpg)
Evaluation Results
• 40% recall• 94% precision,
65% for those not yet linked
• worse than full-NLP results (~ 89%,83%)• slightly better than trivial query (34%,90%)
![Page 30: BIOLINK 2008: Linking database submissions to primary citations with PubMed Central](https://reader035.fdocuments.net/reader035/viewer/2022081412/5455ba4caf79590b088b4a76/html5/thumbnails/30.jpg)
Limitations
• only one datatype• database-centric• performance so far is rather mediocre…
![Page 31: BIOLINK 2008: Linking database submissions to primary citations with PubMed Central](https://reader035.fdocuments.net/reader035/viewer/2022081412/5455ba4caf79590b088b4a76/html5/thumbnails/31.jpg)
Impact?
• Today’s performance:– would increase GEO links by 2.6%– by 5.5% annually when all NIH in PMC
• Double the recall, to 80%:– double the numbers above
☺ GEO curators added the 40 links identified by this study
![Page 32: BIOLINK 2008: Linking database submissions to primary citations with PubMed Central](https://reader035.fdocuments.net/reader035/viewer/2022081412/5455ba4caf79590b088b4a76/html5/thumbnails/32.jpg)
We hope this work
inspires future enhancements, and
highlights the opportunities forsimple full-text queries in PubMed Central given the mandated influx of NIH-funded research reports.
![Page 33: BIOLINK 2008: Linking database submissions to primary citations with PubMed Central](https://reader035.fdocuments.net/reader035/viewer/2022081412/5455ba4caf79590b088b4a76/html5/thumbnails/33.jpg)
Thank youAdvisor: Dr. Wendy ChapmanFunders: NLM and Pitt DBMIEnablers: Everyone who deposits their
publications in PubMed Central!
My shared data: www.dbmi.pitt.edu/piwowarShare your research data too!