Semantics as a service at EMBL-EBI
-
Upload
simon-jupp -
Category
Science
-
view
22 -
download
2
Transcript of Semantics as a service at EMBL-EBI
Simon JuppSamples, Phenotypes and OntologiesEMBL-EBI
Semantic services for data interoperability
Elixir all hands meetingInteroperability workshopMarch 2017
Ontology services as building blocks for FAIR• You need standards (ontologies and controlled
vocabularies) to make data interpretable• Interpretable data is more readily interoperable
• We can use interoperable data to build integrated systems that make the data more findable by user
• The data become reusable when we use common standards
• But, • There are a lot of standards• Doing this at scale for different domains is hard
Improving Findability by greater Interoperability Smarter searching Data analysis
Data integration
Data visualisation
BioSamples case study • description of material of
biological interest• may be linked to assay data
• sequencing, microarray,
• proteomics• also imaging, etc
• We’ve been making this data FAIR for many years
The challenge - thousands of data attributes…
• BioSamples is an example of real world experimental metadata• We see all the variability – warts and all • Good play ground for building tooling to cleanup and add values
to this data• If we can build tooling that works for BioSamples – they’ll work
anywhere!
What are the disease attributes?
diseaseStatehostDiseaseclinicallyAffectedStatusdiagnosisInfectiondiseaseStatus
healthStatediseaseclinicalInformationhostHealthStateaffectedBycauseOfDeath
NOT:diseaseStage: info about the stage of a disease e.g. "48 hai”, “stage”, “terminal”diseasestagetumorStatus:"non-tumor",120, "Tumor",100,"CSL +/+ Xenograft Tumor 1st",healthStatus: "normal","Allergic","stressed”,"NA(Not immunized)"
Makes finding the right data hard
Normalising sample descriptions through annotation with ontologies
CL:CL_0000071 (blood vessel endothelial cell)
obo:CHEBI_39867 (valproic acid)
NCBITaxon:NCBITaxon_9606(Homo Sapiens)
Curation
Ontology challenges• How do I access ontologies?• How do I map data to ontologies? • Which ontologies should I use?• What about data that doesn’t map?• How can I translate from one ontology to another?• How can I extend an ontology?• How do I build “ontology aware” search
applications? • How do I publish this data?
SPOT team - Adding value with ontologies
DataExplorati
onand
Cleanup
Data structuring
OntologyAnnotati
on
Data cleaning
and mapping
Ontologybuildin
g
FAIRified data
Data Enrichment Services• Building an interoperability
toolkit for Europe (Elixir) • Integrated (linked) APIs• Plumbing for data curation
systems and workflows• Lowering the barrier of entry
to ontologies for data stewards
New ontology lookup service!
The Ontology Toolkit
Search/Visualise ontologies
Annotate data
Ontology cross mapping
Create new ontology content
Webulous
Ontology Lookup Service
• Ontology search engine • Ontology term history tracking• Ontology visualisation • Powerful RESTful API
Repository of over 160 pre-selected biomedical ontologies (4.5 million terms)
http://www.ebi.ac.uk/ols
• Provides unified mechanism to access multiple ontologies
• Large community of users, 10s of millions of hits per month
• Open source and dockerised
Zooma
• Optimal mappings based on data we have seen previously• Favours precision over recall
• Captures annotations + context – context is v. important • Currently contains over 92,000 annotations from 7 resources
• ClinVar, Cellular Phenotype Database, ExpressionAtlas, UniProt, GWAS, EBiSC, OpenTargets
• Used to improve and share their mappings across resources
Repository of curated ontology mappings
http://www.ebi.ac.uk/spot/zooma
“Heart”
UBERON:0000948
A Zooma Mapping
+ Context (where, when, why?)
New for 2017 – Ontology Cross Mapping
• Cross-references are a powerful tool for integrating data
• A lot curator effort in building ontology cross-references
• Currently hard to find/explore Ontology Mapping space
Datasource 1 Datasource 2
Human Phenotype Ontology
SNOMED-CTMappings
Ontology Mapping Service (OxO)• UI and API to expose known mappings from OBO, UMLS and
manually curated mappings sets (e.g. GWAS, OpenTargets)• Normlaised CURIE prefixes using identifiers.org
• SNOMED-CT: / SNOMEDCT: / SNOMED: / SNOMEDCT_• Provides a “silver standard” to support predictive mapping
algorithms
* Going live March 2017http://www.ebi.ac.uk/spot/oxo *
Common questions • How do I access ontologies?• How do I map data to ontologies? • Which ontologies should I use?• What about data that doesn’t map?• How can I translate from one ontology to another?• How can I extend an ontology?• How do I build “ontology aware” search
applications? • How do I publish this data?
Data
Get the application ontology from OLS
Building a search index with BioSolr
Publishing structured data as RDF
Yes
No
Yes
No
Yes
No
Webulous OBO foundry
Create a new term
Add mappings back to Zooma
No
Is the data annotated to ontologies?
Is there unmapped data?
Can you find terms in OLS?
Is it the ontology want?
Yes
Data annotation workflow
Search Zooma
Search OLS
Search OxO
Summary• Part of FAIR process will be alignment with
standards• Already many standards and ontologies in use• We build tools and services that help get you
there• You will have to do some curation
• But our tooling can capture that so we can share the burden
• How FAIR is FAIR enough? • We’ll never FAIRify all of BioSamples • Decide what your application is and optimise for that
Ontology team
Helen ParkinsonTony Burdett
Sira SarntivijaiOlga Vrousgou Thomas Liener
Funding• EMBL• CORBEL This project receives funding from the
European Union’s Horizon 2020 research and innovation programme under grant agreement No 654248.
• EXCELERATE ELIXIR-EXCELERATE is funded by the European Commission within the Research Infrastructures programme of Horizon 2020, grant agreement number 676559.