Challenge of Semantics for the Encyclopedia of Life

23
Challenges for semantics in EOL Cynthia Parr National Museum of Natural History Smithsonian Institution Phenotype Ontology RCN NESCent 25 February 2011

description

An introduction to EOL (http://www.eol.org) and some of the challenges and possible applications for structured, semantic information about biological organisms. Presented at the kick-off meeting of the NSF-Funded Phenotype Ontology Research Coordination Network.

Transcript of Challenge of Semantics for the Encyclopedia of Life

Page 1: Challenge of Semantics for the Encyclopedia of Life

Challenges for semantics in EOL

Cynthia ParrNational Museum of Natural HistorySmithsonian Institution

Phenotype Ontology RCNNESCent25 February 2011

Page 2: Challenge of Semantics for the Encyclopedia of Life

http://www.eol.org• All species known to science• Summary descriptions across

biology domains• Freely accessible• Available from a single portal

in a common format• Quality• Always growing

Page 3: Challenge of Semantics for the Encyclopedia of Life

Catalogue of Life

IUCN

GBIF

Biodiversity Heritage Library

Content providersDatabasesJournalsLifeDesksPublic contribution

Curating

CommentingTagging

EOL is a Content Curation Community

Page 4: Challenge of Semantics for the Encyclopedia of Life

Typical species page

Page 5: Challenge of Semantics for the Encyclopedia of Life

Objects can come from many partnersObjects are sorted by topic and by taxonEach partner gets credit

http://www.eol.org/content_partner

Page 6: Challenge of Semantics for the Encyclopedia of Life
Page 7: Challenge of Semantics for the Encyclopedia of Life

Curation, Comments, Tags

Page 8: Challenge of Semantics for the Encyclopedia of Life

Not

Page 9: Challenge of Semantics for the Encyclopedia of Life

Statistics

2.8 million pages – one (or more) per taxon

2 million data objects

500 thousand pages with objects

100+ partner databases

700 curators/1000s contributors/~46,000 members

Page 10: Challenge of Semantics for the Encyclopedia of Life
Page 11: Challenge of Semantics for the Encyclopedia of Life

http://NodeXL.codeplex.com

Page 12: Challenge of Semantics for the Encyclopedia of Life

Schema

Very coarsely structured33 subjects (TDWG Species Profile Model)

No numeric dataMinimal controlled vocabulariesAPI

Page 13: Challenge of Semantics for the Encyclopedia of Life

Corvidae

Page 14: Challenge of Semantics for the Encyclopedia of Life

We have an infrastructure . . .Aggregation mechanismsNames resolutionCuration mechanismsPublic and machine interfaces

Version 2 (August) vastly improved support for community interaction

Version 3 (???)

Page 15: Challenge of Semantics for the Encyclopedia of Life

Rich page calculations

Page 16: Challenge of Semantics for the Encyclopedia of Life

TaxonKey 1 Value

Key 2 Value

Key 3 Value

Key 1 Unit Label URI

Key 2 Unit Label URI

Key 3 Unit Label URI

Possible path to semantics

Page 17: Challenge of Semantics for the Encyclopedia of Life

What could we do?

Page 18: Challenge of Semantics for the Encyclopedia of Life

Organize info on EOL pages

Index by taxonSort into one of the 33

SPM subjectsImprove discoverability

Page 19: Challenge of Semantics for the Encyclopedia of Life

Serve data by API or query interface

“Give me all the information you have about the elbow joint and life histories in rodents”

Page 20: Challenge of Semantics for the Encyclopedia of Life

Make the whole page semantically browsable (LOD: linked open data)TaxonText blobsCharacter dataMetadata

Page 21: Challenge of Semantics for the Encyclopedia of Life

Consistency checks

CuratorsCrowd-sourcingReasoning…

… inferring summaries….mining for patterns?… hypothesis testing?

Page 22: Challenge of Semantics for the Encyclopedia of Life

ievobio.org

Page 23: Challenge of Semantics for the Encyclopedia of Life

Image credits

Michal Koupý Lorraine PhelanDavid J Patterson Dmitry Mozzherin