Linking Linked Data CSHALS2013

26
Expert Bioinformatics from Bioinformatics Experts Linking Linked Data Linked Data to Integrated Data

description

 

Transcript of Linking Linked Data CSHALS2013

Page 1: Linking Linked Data CSHALS2013

Expert Bioinformatics from Bioinformatics Experts

Linking Linked Data

Linked Data to Integrated Data

Page 2: Linking Linked Data CSHALS2013

Expert Bioinformatics from Bioinformatics Experts

Put your data on the web make a pretty web site later.

Page 3: Linking Linked Data CSHALS2013

Expert Bioinformatics from Bioinformatics Experts

Page 4: Linking Linked Data CSHALS2013

Expert Bioinformatics from Bioinformatics Experts

Now we can ask questions like this...

What members of a target pathway are already targeted in other diseases?

Target

PathwayDisease

Protein

Compound

Target Pathway Disease

Chembl Uniprot Reactome OMIM

Page 5: Linking Linked Data CSHALS2013

Expert Bioinformatics from Bioinformatics Experts

Because we have lots of data exposed as RDF

Mim:Phenotype

Uniprot:ProteinBioPAX:Protein

Page 6: Linking Linked Data CSHALS2013

Expert Bioinformatics from Bioinformatics Experts

What do you do when you have to add data...

Page 7: Linking Linked Data CSHALS2013

Expert Bioinformatics from Bioinformatics Experts

Or connect SPARQL endpoints?

RDF != Linked Data

Page 8: Linking Linked Data CSHALS2013

Expert Bioinformatics from Bioinformatics Experts

Is your data 5* ?

Linked data is essential to actually connect the semantic web. It is quite easy to do with a little thought, and becomes second nature. Various common sense considerations determine when to make a link and when not to.

Page 9: Linking Linked Data CSHALS2013

Expert Bioinformatics from Bioinformatics Experts

Example openflydata to BioCycWhat genes are differentially expressed in the hindgut and are there any pathways associated with those genes?● Use FlyAtlas at openflydata.org for tissue specific expression profiles. ● Use FlyCyc from BioCyc.● Then SPARQL

Page 10: Linking Linked Data CSHALS2013

Expert Bioinformatics from Bioinformatics Experts

Problem: Node URIs<http://openflydata.org/id/flyatlas/affyid/1616608_a_at> <http://purl.org/NET/flyatlas/schema#gene> <http://openflydata.org/id/flybase/feature/FBgn0001128> .

<http://biocyc.org/biopax/biopax-level3#UnificationXref202209><http://www.biopax.org/release/biopax-level3.owl#xref><http://biocyc.org/biopax/biopax-level3#Protein202210>

.<http://biocyc.org/biopax/biopax-level3#UnificationXref202209><http://www.biopax.org/release/biopax-level3.owl#db>

FlyCyc.

<http://biocyc.org/biopax/biopax-level3#UnificationXref202209><http://www.biopax.org/release/biopax-level3.owl#id>

FBGN0001128.

Page 11: Linking Linked Data CSHALS2013

Expert Bioinformatics from Bioinformatics Experts

CONSTRUCT { ?x

RDFS:seeAlso `bif:sprintf_iri ("http://identifiers.org/flybase/%s", ?id)`

}WHERE { ?x BP:unificationxref ?xref . ?xref BP:id ?id . ?blank BP:db "FlyCyc"^^xsd:string}

Integration Level 1Use Identifiers.org

Page 12: Linking Linked Data CSHALS2013

Expert Bioinformatics from Bioinformatics Experts

Integration Level 2 adding property characteristics

BP = <http://www.biopax.org/release/biopax-level3.owl#>

BP:Protein BP:controls BP:Catalysis

BP:Catalysis BP:controls BP:BioChemicalReaction

BP:Protein BP:controls BP:BioChemicalReaction

CONSTRUCT {?x GB:controlledBy ?y }WHERE

{ ?x BP:controls ?catalysis . ?catalysis BP:controls ?y }

Page 13: Linking Linked Data CSHALS2013

Expert Bioinformatics from Bioinformatics Experts

Integration Level 3 class subsumption

FlyA = <http://purl.org/NET/flyatlas/schema#>

flywebflyatlas:1616608_a_at a flyatlas:ProbeData

BP = <http://www.biopax.org/release/biopax-level3.owl#>

flyatlas:ProbeData rdfs:subClassOf BP:DNARegion

CONSTRUCT {?x a BP:DNARegion }WHERE

{ ?x a flyatlas:ProbeData }

Page 14: Linking Linked Data CSHALS2013

Expert Bioinformatics from Bioinformatics Experts

Connect BiochemicalReactions to Expression Values

SELECT ?name ?id ?meanWHERE{ ?reaction a BP:BiochemicalReaction . ?reaction BP:standardName ?name . ?reaction GB:controlledBy ?protein . ?protein a BP:Protein . ?protein BP:xref ?id . ?probe a BP:DNARegion . ?probe BP:xref ?id . ?probe flyatlas:l_fatbody ?blank . ?blank flyatlas:mean ?mean}LIMIT 5

No Reasoner – just a few SPARQL CONSTRUCTs

Page 15: Linking Linked Data CSHALS2013

Expert Bioinformatics from Bioinformatics Experts

Page 16: Linking Linked Data CSHALS2013

Expert Bioinformatics from Bioinformatics Experts

Client Architecture

Page 17: Linking Linked Data CSHALS2013

Expert Bioinformatics from Bioinformatics Experts

SELECT distinct ?classWHERE{ ?s a ?class . ?s ?p ?o }

>100

chembl:Activitychembl:Assaychembl:AssayCategorychembl:AssayTargetLinkchembl:ChemicalCompoundchembl:DrugTargetchembl:LiteratureCitationdailymed:drugsdrugbank:Drugdrugbank:DrugInteractiondrugbank:EnzymeLinkdrugbank:ExternalIdentifierdrugbank:ExternalLinkdrugbank:LiteratureCitationdrugbank:Moleculedrugbank:OrganismSpeciesdrugbank:Patentdrugbank:ProteinSequencedrugbank:TargetLinkentrez:EnsemblReferenceentrez:Genepdb:Moleculepdb:Structurepubmed:Chemicalpubmed:Citationpubmed:DatabankReference

Vocabularies in Linked DataWhat does the linked data cloud know about Drugs....

Page 18: Linking Linked Data CSHALS2013

Expert Bioinformatics from Bioinformatics Experts

Create a tighter more unified “view” under one schema

Page 19: Linking Linked Data CSHALS2013

Expert Bioinformatics from Bioinformatics Experts

Unified Vocabulary What does the linked data cloud know about Drugs....

Page 20: Linking Linked Data CSHALS2013

Expert Bioinformatics from Bioinformatics Experts

Map Classes and Properties into a single instantiated view

Page 21: Linking Linked Data CSHALS2013

Expert Bioinformatics from Bioinformatics Experts

Before Query

SELECT *WHERE{?s drugb:calculatedInChIKey ?inchiD . ?s a drugb:Drug . ?c a Chembl:ChemicalCompund . ?c chembl:standardInChIKey ?inchiC .FILTER regex(?inchiD, ?inchiC)}

Page 22: Linking Linked Data CSHALS2013

Expert Bioinformatics from Bioinformatics Experts

After Query

SELECT *where{?s a GB:Drug .?s GB:inchiKey ?inchi . }

Page 23: Linking Linked Data CSHALS2013

Expert Bioinformatics from Bioinformatics Experts

Linked Data Architecture

Page 24: Linking Linked Data CSHALS2013

Expert Bioinformatics from Bioinformatics Experts

Creating fixed “views” of Linked Data

When the use of integrated data is fixed e.g. an API or application, Linked Data can be expensive:

– Changes to data requires significant recoding

– Multiple Schemas make queries long and inefficient

• A view or middle layer of data used by the API, changes to data are managed by the view and the API is minimally disturbed

– Views are easier to query

– Views are faster to query

• Client gets the best of both worlds a tight view of data for API queries while still having all the advantages of a linked data strategy.

Page 25: Linking Linked Data CSHALS2013

Expert Bioinformatics from Bioinformatics Experts

Summary

● Exposing data as RDF does not equal Linked Data● Making data linked is not hard

– Node IRI's– Unifying Classes– Transitive closure of Properties

● A little semantics goes a long way (no reasoner required)● Creating “Views” from one schema to another is not hard.

– But should be easier

Page 26: Linking Linked Data CSHALS2013

Expert Bioinformatics from Bioinformatics Experts

www.generalbioinformatics.com/science.html