Post on 13-Apr-2017
Resource Description Framework
Subject ObjectPredicate
a
query with a
Resource Description Framework
RDF
Turtle
RDFainsideHTML
N-Triples
RDF/
THRIFT
JSON-LD
RDF/
XMLSerialise as
Serialise
as
Ser
ialis
e asSerialise as
Serialise as
Serialise as
Resource Description Framework
RDF
Turtle
RDFainsideHTML
N-Triples
RDF/
THRIFT
JSON-LD
RDF/
XMLSerialise as
Serialise
as
Ser
ialis
e asSerialise as
Serialise as
Serialise as
= =
= =
= =
Resource Description Framework
Nodes and Edges are Resources
• Resource → Identified by a URI– http://purl.uniprot.org/core/– urn:guid:21EC2020-3AEA-4069-A2DD-08002B30309D– mailto:help@uniprot.org– urb:isbn:978-3-16-148410-0
• Nice if public but not a requirement
Resource Description Framework
Terminal edges are literals
• String (xsd:string)“P53”
• Date (xsd:date & xsd:dateTime)"1987-08-13"^^xsd:date
• Numbers (xsd:int & xsd:decimal & …)1 or “1”^^xsd:integer or -1.1 or “-1.1”^^xsd:decimal
• Language string“Switzerland”@en“Suisse”@fr“Schweiz”@de“Svizzera”@it
Resource Description Framework
one party evolves data format
everyone evolves data format
Protocol BuffersGoogle's data interchange formatGFF
T
4 nodes
1
2
4
3
AC
ACTG GA
base <uri of vg schema>
prefixnode:<uri of vg graph>
node:1 a <Node> ;rdf:value “ACTG” .
node:2 a <Node> ;rdf:value “AC” .
node:3 a <Node> ;rdf:value “T” .
node:4 a <Node> ;rdf:value “GA”
Variation Graph as RDF
T
4 nodes
1
2
4
3
AC
ACTG GA
base <uri of vg schema>
prefixnode:<uri of vg graph>
node:1<linksForwardToForward>
node:2 , node:3 .
node:2<linksForwardToForward>
node:4 .
node:3<linksForwardToForward>
node:4 .Variation Graph as RDF
T
4 nodes → 1 Path
1
2
4
3
AC
ACTG GA
base <uri of vg schema>
prefixn:<uri of vg graph>
path:1 a <Path> ;rdfs:label “Genome of
patient a” ;rdfs:comment “Paths
through VG make linear sequences, e.g. a reference genome or a patient assembly”
Variation Graph as RDF
T
4 nodes → 1 Path → 3 Steps
1
2
4
3
AC
ACTG GA
base <uri of vg schema>
prefixn:<uri of vg graph>
step:1 a <Step> ;<node> node:1 ;<rank> 1 ;<path> path:1 .
step:2 a <Step> ;<node> node:2 ;<rank> 2 ;<path> path:1 .
Variation Graph as RDF
Build a “FASTA” from a VG
PREFIX vg:<http://example.org/vg/>PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>SELECT ?path
(group_concat(?sequence; separator="") as ?pathSeq)WHERE { [] vg:path ?path; vg:node ?node; vg:rank ?rank. ?node rdf:value ?sequence}GROUP BY ?pathORDER BY ?rank
Variation Graph as RDF
PREFIX vg:<http://example.org/vg/>PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>SELECT ?path
(group_concat(?sequence; separator="") as ?pathSeq)WHERE { [] vg:path ?path; vg:node ?node; vg:rank ?rank. ?node rdf:value ?sequence}GROUP BY ?pathORDER BY ?rank
Build a “FASTA” from a VG
PREFIX vg:<http://example.org/vg/>PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>SELECT ?path
(group_concat(?sequence; separator="") as ?pathSeq)WHERE { [] vg:path ?path; vg:node ?node; vg:rank ?rank. ?node rdf:value ?sequence}GROUP BY ?pathORDER BY ?rank
Build a “FASTA” from a VG
PREFIX vg:<http://example.org/vg/>PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>SELECT ?path
(group_concat(?sequence; separator="") as ?pathSeq)WHERE { [] vg:path ?path; vg:node ?node; vg:rank ?rank. ?node rdf:value ?sequence}GROUP BY ?pathORDER BY ?rank
Build a “FASTA” from a VG
SPARQL a standard query language
See VG WIKI for more examples
VG 1000 Genomes → 50 GB on disk in DB
VG 100,000 Genomes → ±2 TB on disk in DB
Querying a Variation Graph
Summary
• RDF– simple data model– consistent identifiers– anyone can say anything about anything
• SPARQL– graph query language– wide scale commercial deployment– HTTP|REST in the box– in clinical use– federated queries on user demand– can be used for variation graphs