20110725 ibc xml
-
Upload
agosti -
Category
Technology
-
view
580 -
download
3
description
Transcript of 20110725 ibc xml
![Page 1: 20110725 ibc xml](https://reader033.fdocuments.net/reader033/viewer/2022052618/554e9c61b4c90573338b54a9/html5/thumbnails/1.jpg)
A Schema for Description and Exchange of Taxonomic
Publication's Content
Donat Agosti, Terry Catapano, Lyubomir Penev & Guido Sautter Plazi, Bern, Switzerland
25. July 2011, IBC, Melbourne
![Page 2: 20110725 ibc xml](https://reader033.fdocuments.net/reader033/viewer/2022052618/554e9c61b4c90573338b54a9/html5/thumbnails/2.jpg)
WHY?
![Page 3: 20110725 ibc xml](https://reader033.fdocuments.net/reader033/viewer/2022052618/554e9c61b4c90573338b54a9/html5/thumbnails/3.jpg)
disseminateaccess
knowledge
![Page 4: 20110725 ibc xml](https://reader033.fdocuments.net/reader033/viewer/2022052618/554e9c61b4c90573338b54a9/html5/thumbnails/4.jpg)
New York Times, July 19, 2011
![Page 5: 20110725 ibc xml](https://reader033.fdocuments.net/reader033/viewer/2022052618/554e9c61b4c90573338b54a9/html5/thumbnails/5.jpg)
“JSTOR's the one that should be in prison, man, for locking up
knowledge.”
Hufpost Politics, July 19, 2011http://www.huffingtonpost.com/2011/07/19/huffpost-hill----gang-vio_n_904027.html
![Page 6: 20110725 ibc xml](https://reader033.fdocuments.net/reader033/viewer/2022052618/554e9c61b4c90573338b54a9/html5/thumbnails/6.jpg)
OpenAccess
![Page 7: 20110725 ibc xml](https://reader033.fdocuments.net/reader033/viewer/2022052618/554e9c61b4c90573338b54a9/html5/thumbnails/7.jpg)
An example from the Neurocommons text mining pilot:
• PubMed abstracts: > 16,000,000• CNS classified abstracts: 874,727• text mining recognized: 368,688• text mining processed: 94,381
• extracted graph of 30,000+ relationships and 5,500 genes and proteins
“protein-protein interaction networks” John Wilbanks, Neurocommons
![Page 8: 20110725 ibc xml](https://reader033.fdocuments.net/reader033/viewer/2022052618/554e9c61b4c90573338b54a9/html5/thumbnails/8.jpg)
In a semantic Web environment (where machines talk to each other and do most of our work), data need to be able to talk to each other:
27,266 papers
4,563 papers41,985 papers
10,365 papers
128,437 papers
“protein-protein interaction networks” John Wilbanks, Neurocommons
![Page 9: 20110725 ibc xml](https://reader033.fdocuments.net/reader033/viewer/2022052618/554e9c61b4c90573338b54a9/html5/thumbnails/9.jpg)
It will open up scientific literature for data mining
“protein-protein interaction networks” John Wilbanks, Neurocommons
![Page 10: 20110725 ibc xml](https://reader033.fdocuments.net/reader033/viewer/2022052618/554e9c61b4c90573338b54a9/html5/thumbnails/10.jpg)
HOW?
![Page 11: 20110725 ibc xml](https://reader033.fdocuments.net/reader033/viewer/2022052618/554e9c61b4c90573338b54a9/html5/thumbnails/11.jpg)
accessfor human
ANDmachine
![Page 12: 20110725 ibc xml](https://reader033.fdocuments.net/reader033/viewer/2022052618/554e9c61b4c90573338b54a9/html5/thumbnails/12.jpg)
It is about digesting millions of pages:
>>100 M pages taxonomic literature
25M scientific publications / year25K journals
>2K with zoological taxonomic descriptions
18K descriptions of new species / year
![Page 13: 20110725 ibc xml](https://reader033.fdocuments.net/reader033/viewer/2022052618/554e9c61b4c90573338b54a9/html5/thumbnails/13.jpg)
PDF is not enough
![Page 14: 20110725 ibc xml](https://reader033.fdocuments.net/reader033/viewer/2022052618/554e9c61b4c90573338b54a9/html5/thumbnails/14.jpg)
data and information in context
![Page 15: 20110725 ibc xml](https://reader033.fdocuments.net/reader033/viewer/2022052618/554e9c61b4c90573338b54a9/html5/thumbnails/15.jpg)
semantic markup
![Page 16: 20110725 ibc xml](https://reader033.fdocuments.net/reader033/viewer/2022052618/554e9c61b4c90573338b54a9/html5/thumbnails/16.jpg)
context of content
![Page 17: 20110725 ibc xml](https://reader033.fdocuments.net/reader033/viewer/2022052618/554e9c61b4c90573338b54a9/html5/thumbnails/17.jpg)
XMLeXtended Markup Language
![Page 18: 20110725 ibc xml](https://reader033.fdocuments.net/reader033/viewer/2022052618/554e9c61b4c90573338b54a9/html5/thumbnails/18.jpg)
<tax:treatment> <tax:nomenclature> <tax:name> <tax:xid source="HNS" identifier="193329"/> <tax:xmldata> <dc:Genus>Mystrium</dc:Genus> <dc:Species>leonie</dc:Species> </tax:xmldata> Mystrium leonie Bihn & Verhaagh, new species </tax:name> <tax:status>n. sp.</tax:status> Fig 1 D - F </tax:nomenclature> <tax:div type="description"> <tax:p>HOLOTYPE WORKER: TL 3.95, HL 1.02, HW 0.95, CI 93, SL 1.30, SI 137, PW 0.73, ML 0.38. Mandible outer margin strongly curving to a sharp apical tooth, the apex parallel to the anterior clypeal margin. (Holotype with material in mandibles, so mandibles and anterior clypeus described below from paratypes.) Median clypeus....</treatment>
![Page 19: 20110725 ibc xml](https://reader033.fdocuments.net/reader033/viewer/2022052618/554e9c61b4c90573338b54a9/html5/thumbnails/19.jpg)
content in a complex e-environment
![Page 20: 20110725 ibc xml](https://reader033.fdocuments.net/reader033/viewer/2022052618/554e9c61b4c90573338b54a9/html5/thumbnails/20.jpg)
linking
![Page 21: 20110725 ibc xml](https://reader033.fdocuments.net/reader033/viewer/2022052618/554e9c61b4c90573338b54a9/html5/thumbnails/21.jpg)
Azteca instabilis
Would then read like
<tax:name><tax:xid source=“LSID" identifier=“urn:lsid:biosci.ohio-state.edu.osuc_concetps:13452"/> Link to external database <tax:xmldata> Normalization of data <dc:Genus>Azteca</dc:Genus> <dc:Species>instabilis</dc:Species> </tax:xmldata>
Azteca instabilis </tax:name>
![Page 22: 20110725 ibc xml](https://reader033.fdocuments.net/reader033/viewer/2022052618/554e9c61b4c90573338b54a9/html5/thumbnails/22.jpg)
definition of XML tags
DTDschema
![Page 23: 20110725 ibc xml](https://reader033.fdocuments.net/reader033/viewer/2022052618/554e9c61b4c90573338b54a9/html5/thumbnails/23.jpg)
transformations from XML
htmlpdf
rdfarchiving
database
![Page 24: 20110725 ibc xml](https://reader033.fdocuments.net/reader033/viewer/2022052618/554e9c61b4c90573338b54a9/html5/thumbnails/24.jpg)
legacy TaxonXTaxpub prospective
![Page 25: 20110725 ibc xml](https://reader033.fdocuments.net/reader033/viewer/2022052618/554e9c61b4c90573338b54a9/html5/thumbnails/25.jpg)
how to use XML?
![Page 26: 20110725 ibc xml](https://reader033.fdocuments.net/reader033/viewer/2022052618/554e9c61b4c90573338b54a9/html5/thumbnails/26.jpg)
legacy publications
![Page 27: 20110725 ibc xml](https://reader033.fdocuments.net/reader033/viewer/2022052618/554e9c61b4c90573338b54a9/html5/thumbnails/27.jpg)
- Get LSID from Hymenoptera Name Server for names; ZooBank?-Add new names
- Get bibliographic Metadata from HNS (MODS)
- Get bibliographic Guids from bioguid (or EDIT?)
- Get geographic long/lat from geonames.org
Plazi workflow: GoldenGate editor based mark up and linking
-Get Guids for - CBOL- NCBI- specimen- images- .....
Legacy publications
![Page 28: 20110725 ibc xml](https://reader033.fdocuments.net/reader033/viewer/2022052618/554e9c61b4c90573338b54a9/html5/thumbnails/28.jpg)
linked data
![Page 29: 20110725 ibc xml](https://reader033.fdocuments.net/reader033/viewer/2022052618/554e9c61b4c90573338b54a9/html5/thumbnails/29.jpg)
last resort
![Page 30: 20110725 ibc xml](https://reader033.fdocuments.net/reader033/viewer/2022052618/554e9c61b4c90573338b54a9/html5/thumbnails/30.jpg)
prospective publications
![Page 31: 20110725 ibc xml](https://reader033.fdocuments.net/reader033/viewer/2022052618/554e9c61b4c90573338b54a9/html5/thumbnails/31.jpg)
![Page 32: 20110725 ibc xml](https://reader033.fdocuments.net/reader033/viewer/2022052618/554e9c61b4c90573338b54a9/html5/thumbnails/32.jpg)
the future
![Page 33: 20110725 ibc xml](https://reader033.fdocuments.net/reader033/viewer/2022052618/554e9c61b4c90573338b54a9/html5/thumbnails/33.jpg)
dissemination - access
![Page 34: 20110725 ibc xml](https://reader033.fdocuments.net/reader033/viewer/2022052618/554e9c61b4c90573338b54a9/html5/thumbnails/34.jpg)
Plazi: access to treatments
TAPIR, SPM, etc.
You
You
You
human
machine
![Page 35: 20110725 ibc xml](https://reader033.fdocuments.net/reader033/viewer/2022052618/554e9c61b4c90573338b54a9/html5/thumbnails/35.jpg)
It will open up scientific literature for data mining and extraction
“protein-protein interaction networks” John Wilbanks, Neurocommons
![Page 36: 20110725 ibc xml](https://reader033.fdocuments.net/reader033/viewer/2022052618/554e9c61b4c90573338b54a9/html5/thumbnails/36.jpg)
http://plazi.org
Thank you very much!
Donat Agosti, Terry Catapano, Lyubomir Penev & Guido Sautter
![Page 37: 20110725 ibc xml](https://reader033.fdocuments.net/reader033/viewer/2022052618/554e9c61b4c90573338b54a9/html5/thumbnails/37.jpg)
JSTOR did not permit users:c. to make other than
personal use of individually downloaded articles.
Aaron Swartz indictment, July 14, 2011