Semantics as a service at EMBL-EBI

20
Simon Jupp Samples, Phenotypes and Ontologies EMBL-EBI Semantic services for data interoperability Elixir all hands meeting Interoperability workshop March 2017

Transcript of Semantics as a service at EMBL-EBI

Page 1: Semantics as a service at EMBL-EBI

Simon JuppSamples, Phenotypes and OntologiesEMBL-EBI

Semantic services for data interoperability

Elixir all hands meetingInteroperability workshopMarch 2017

Page 2: Semantics as a service at EMBL-EBI

Ontology services as building blocks for FAIR• You need standards (ontologies and controlled

vocabularies) to make data interpretable• Interpretable data is more readily interoperable

• We can use interoperable data to build integrated systems that make the data more findable by user

• The data become reusable when we use common standards

• But, • There are a lot of standards• Doing this at scale for different domains is hard

Page 3: Semantics as a service at EMBL-EBI

Improving Findability by greater Interoperability Smarter searching Data analysis

Data integration

Data visualisation

Page 4: Semantics as a service at EMBL-EBI

BioSamples case study • description of material of

biological interest• may be linked to assay data

• sequencing, microarray,

• proteomics• also imaging, etc

• We’ve been making this data FAIR for many years

Page 5: Semantics as a service at EMBL-EBI

The challenge - thousands of data attributes…

• BioSamples is an example of real world experimental metadata• We see all the variability – warts and all • Good play ground for building tooling to cleanup and add values

to this data• If we can build tooling that works for BioSamples – they’ll work

anywhere!

Page 6: Semantics as a service at EMBL-EBI

What are the disease attributes?

diseaseStatehostDiseaseclinicallyAffectedStatusdiagnosisInfectiondiseaseStatus

healthStatediseaseclinicalInformationhostHealthStateaffectedBycauseOfDeath

NOT:diseaseStage: info about the stage of a disease e.g. "48 hai”, “stage”, “terminal”diseasestagetumorStatus:"non-tumor",120, "Tumor",100,"CSL +/+ Xenograft Tumor 1st",healthStatus: "normal","Allergic","stressed”,"NA(Not immunized)"

Page 7: Semantics as a service at EMBL-EBI

Makes finding the right data hard

Page 8: Semantics as a service at EMBL-EBI

Normalising sample descriptions through annotation with ontologies

CL:CL_0000071 (blood vessel endothelial cell)

obo:CHEBI_39867 (valproic acid)

NCBITaxon:NCBITaxon_9606(Homo Sapiens)

Curation

Page 9: Semantics as a service at EMBL-EBI

Ontology challenges• How do I access ontologies?• How do I map data to ontologies? • Which ontologies should I use?• What about data that doesn’t map?• How can I translate from one ontology to another?• How can I extend an ontology?• How do I build “ontology aware” search

applications? • How do I publish this data?

Page 10: Semantics as a service at EMBL-EBI

SPOT team - Adding value with ontologies

DataExplorati

onand

Cleanup

Data structuring

OntologyAnnotati

on

Data cleaning

and mapping

Ontologybuildin

g

FAIRified data

Page 11: Semantics as a service at EMBL-EBI

Data Enrichment Services• Building an interoperability

toolkit for Europe (Elixir) • Integrated (linked) APIs• Plumbing for data curation

systems and workflows• Lowering the barrier of entry

to ontologies for data stewards

New ontology lookup service!

Page 12: Semantics as a service at EMBL-EBI

The Ontology Toolkit

Search/Visualise ontologies

Annotate data

Ontology cross mapping

Create new ontology content

Webulous

Page 13: Semantics as a service at EMBL-EBI

Ontology Lookup Service

• Ontology search engine • Ontology term history tracking• Ontology visualisation • Powerful RESTful API

Repository of over 160 pre-selected biomedical ontologies (4.5 million terms)

http://www.ebi.ac.uk/ols

• Provides unified mechanism to access multiple ontologies

• Large community of users, 10s of millions of hits per month

• Open source and dockerised

Page 14: Semantics as a service at EMBL-EBI

Zooma

• Optimal mappings based on data we have seen previously• Favours precision over recall

• Captures annotations + context – context is v. important • Currently contains over 92,000 annotations from 7 resources

• ClinVar, Cellular Phenotype Database, ExpressionAtlas, UniProt, GWAS, EBiSC, OpenTargets

• Used to improve and share their mappings across resources

Repository of curated ontology mappings

http://www.ebi.ac.uk/spot/zooma

“Heart”

UBERON:0000948

A Zooma Mapping

+ Context (where, when, why?)

Page 15: Semantics as a service at EMBL-EBI

New for 2017 – Ontology Cross Mapping

• Cross-references are a powerful tool for integrating data

• A lot curator effort in building ontology cross-references

• Currently hard to find/explore Ontology Mapping space

Datasource 1 Datasource 2

Human Phenotype Ontology

SNOMED-CTMappings

Page 16: Semantics as a service at EMBL-EBI

Ontology Mapping Service (OxO)• UI and API to expose known mappings from OBO, UMLS and

manually curated mappings sets (e.g. GWAS, OpenTargets)• Normlaised CURIE prefixes using identifiers.org

• SNOMED-CT: / SNOMEDCT: / SNOMED: / SNOMEDCT_• Provides a “silver standard” to support predictive mapping

algorithms

* Going live March 2017http://www.ebi.ac.uk/spot/oxo *

Page 17: Semantics as a service at EMBL-EBI

Common questions • How do I access ontologies?• How do I map data to ontologies? • Which ontologies should I use?• What about data that doesn’t map?• How can I translate from one ontology to another?• How can I extend an ontology?• How do I build “ontology aware” search

applications? • How do I publish this data?

Page 18: Semantics as a service at EMBL-EBI

Data

Get the application ontology from OLS

Building a search index with BioSolr

Publishing structured data as RDF

Yes

No

Yes

No

Yes

No

Webulous OBO foundry

Create a new term

Add mappings back to Zooma

No

Is the data annotated to ontologies?

Is there unmapped data?

Can you find terms in OLS?

Is it the ontology want?

Yes

Data annotation workflow

Search Zooma

Search OLS

Search OxO

Page 19: Semantics as a service at EMBL-EBI

Summary• Part of FAIR process will be alignment with

standards• Already many standards and ontologies in use• We build tools and services that help get you

there• You will have to do some curation

• But our tooling can capture that so we can share the burden

• How FAIR is FAIR enough? • We’ll never FAIRify all of BioSamples • Decide what your application is and optimise for that

Page 20: Semantics as a service at EMBL-EBI

Ontology team

Helen ParkinsonTony Burdett

Sira SarntivijaiOlga Vrousgou Thomas Liener

Funding• EMBL• CORBEL This project receives funding from the

European Union’s Horizon 2020 research and innovation programme under grant agreement No 654248.

• EXCELERATE ELIXIR-EXCELERATE is funded by the European Commission within the Research Infrastructures programme of Horizon 2020, grant agreement number 676559.