Towards Data Attribution & Citation in the Life Sciences Philip E. Bourne UCSD pbourne@ucsd
-
Upload
petra-knox -
Category
Documents
-
view
22 -
download
2
description
Transcript of Towards Data Attribution & Citation in the Life Sciences Philip E. Bourne UCSD pbourne@ucsd
![Page 1: Towards Data Attribution & Citation in the Life Sciences Philip E. Bourne UCSD pbourne@ucsd](https://reader030.fdocuments.net/reader030/viewer/2022032806/56813445550346895d9b2e52/html5/thumbnails/1.jpg)
Towards Data Attribution &Citation in the Life Sciences
Philip E. BourneUCSD
8/22/11 Data Attribution and Citation
![Page 2: Towards Data Attribution & Citation in the Life Sciences Philip E. Bourne UCSD pbourne@ucsd](https://reader030.fdocuments.net/reader030/viewer/2022032806/56813445550346895d9b2e52/html5/thumbnails/2.jpg)
Life Science Data Repositories
NLM is the elephant in the room .. However .. There are thousands on community maintained
efforts – all want an NAR publication The ability to cite and attribute the data are highly
variable:– DOIs assigned in some cases, but not used– Attribution is through the metadata in most cases– Citation is typically by the associated literature reference if it exists,
and/or a database identifier– The use of data repositories such as Dryad is compelling for the long
tail problem– Data journals are on the horizon
8/22/11 Data Attribution and Citation
![Page 3: Towards Data Attribution & Citation in the Life Sciences Philip E. Bourne UCSD pbourne@ucsd](https://reader030.fdocuments.net/reader030/viewer/2022032806/56813445550346895d9b2e52/html5/thumbnails/3.jpg)
Consider the PDB as a Use Case
Oldest data resource in biology?
A resource used by ~ 200,000 individuals per month – increasing number of school kids!
A resource distributing worldwide the equivalent to ¼ the National Library of Congress each month
A bicoastal/worldwide resource
1TB
8/22/11 Data Attribution and Citation
![Page 4: Towards Data Attribution & Citation in the Life Sciences Philip E. Bourne UCSD pbourne@ucsd](https://reader030.fdocuments.net/reader030/viewer/2022032806/56813445550346895d9b2e52/html5/thumbnails/4.jpg)
Nu
mb
er o
f re
leas
ed e
ntr
ies
Year
PDB Typical Growth Curve – But the Complexity!
8/22/11
![Page 5: Towards Data Attribution & Citation in the Life Sciences Philip E. Bourne UCSD pbourne@ucsd](https://reader030.fdocuments.net/reader030/viewer/2022032806/56813445550346895d9b2e52/html5/thumbnails/5.jpg)
People are doing more with the data
Number of visits and page views is growing faster than number of unique visitors
![Page 6: Towards Data Attribution & Citation in the Life Sciences Philip E. Bourne UCSD pbourne@ucsd](https://reader030.fdocuments.net/reader030/viewer/2022032806/56813445550346895d9b2e52/html5/thumbnails/6.jpg)
The Data May Save Lives?
* http://www.cdc.gov/h1n1flu/estimates/April_March_13.htm
Jan. 2008 Jan. 2009 Jan. 2010Jul. 2009Jul. 2008 Jul. 2010
1RUZ: 1918 H1 Hemagglutinin
Structure Summary page activity forH1N1 Influenza related structures
*
3B7E: Neuraminidase of A/Brevig Mission/1/1918 H1N1 strain in complex with zanamivir
![Page 7: Towards Data Attribution & Citation in the Life Sciences Philip E. Bourne UCSD pbourne@ucsd](https://reader030.fdocuments.net/reader030/viewer/2022032806/56813445550346895d9b2e52/html5/thumbnails/7.jpg)
PDB Data Attribution and Citation
About 25% of our budget has been spent on data remediation – multiple versions supported – the copy of record (as defined by the publication) is always available
Cant publish unless data are deposited – motivated by the community - very good data to publication correspondence
Data objects are discreet and we assign DOIs – but they are not used – database identifiers preferred
8/22/11 Data Attribution and Citation
![Page 8: Towards Data Attribution & Citation in the Life Sciences Philip E. Bourne UCSD pbourne@ucsd](https://reader030.fdocuments.net/reader030/viewer/2022032806/56813445550346895d9b2e52/html5/thumbnails/8.jpg)
Ah yes .. But the CD4 Story…
![Page 9: Towards Data Attribution & Citation in the Life Sciences Philip E. Bourne UCSD pbourne@ucsd](https://reader030.fdocuments.net/reader030/viewer/2022032806/56813445550346895d9b2e52/html5/thumbnails/9.jpg)
1. A link brings up figures from the paper
0. Full text of PLoS papers stored in a database
2. Clicking the paper figure retrievesdata from the PDB which is
analyzed
3. A composite view ofjournal and database
content results
Literature/Data Integration
1. User clicks on content
2. Metadata and webservices to data provide an interactive view that can be annotated
3. Selecting features provides a data/knowledge mashup
4. Analysis leads to new content I can share
4. The composite view haslinks to pertinent blocks
of literature text and back to the PDB
1.
2.
3.
4.
The Knowledge and Data Cycle
PLoS Comp. Biol. 2005 1(3) e348/22/11
![Page 10: Towards Data Attribution & Citation in the Life Sciences Philip E. Bourne UCSD pbourne@ucsd](https://reader030.fdocuments.net/reader030/viewer/2022032806/56813445550346895d9b2e52/html5/thumbnails/10.jpg)
www.rcsb.org/pdb/explore/literature.do?structureId=1TIM
Example of Interoperability: The Database View
BMC Bioinformatics 2010 11:220
![Page 11: Towards Data Attribution & Citation in the Life Sciences Philip E. Bourne UCSD pbourne@ucsd](https://reader030.fdocuments.net/reader030/viewer/2022032806/56813445550346895d9b2e52/html5/thumbnails/11.jpg)
Example of Interoperability – The Literature View
From Anita de Waard, Elsevier
![Page 12: Towards Data Attribution & Citation in the Life Sciences Philip E. Bourne UCSD pbourne@ucsd](https://reader030.fdocuments.net/reader030/viewer/2022032806/56813445550346895d9b2e52/html5/thumbnails/12.jpg)
Acknowledgements
Funding Agencies: NSF, NIGMS, DOE, NLM, NCI, NCRR, NIBIB, NINDS, NIDDK
128/22/11 Data Attribution and Citation