Splendid: SPARQL Endpoint Federation Exploiting VOID Descriptions

16
Institute for Web Science and Technologies University of Koblenz ▪ Landau, Germany SPLENDID: SPARQL Endpoint Federation Exploiting VOID Descriptions Olaf Görlitz, Steffen Staab

Transcript of Splendid: SPARQL Endpoint Federation Exploiting VOID Descriptions

Page 1: Splendid: SPARQL Endpoint Federation Exploiting VOID Descriptions

Institute for Web Science and Technologies

University of Koblenz ▪ Landau, Germany

SPLENDID: SPARQL Endpoint Federation

Exploiting VOID Descriptions

Olaf Görlitz, Steffen Staab

Page 2: Splendid: SPARQL Endpoint Federation Exploiting VOID Descriptions

Slide 2WeST InstitutePeople and Knowledge Networks

Olaf GörlitzCOLD 2011, Bonn, Germany

Motivation

How to access a large number of linked data sources?

Page 3: Splendid: SPARQL Endpoint Federation Exploiting VOID Descriptions

Slide 3WeST InstitutePeople and Knowledge Networks

Olaf GörlitzCOLD 2011, Bonn, Germany

Data Integration Approaches

Data Warehouse

Efficient query execution Complete results Data copies Inflexible

Link Traversal

Live Data Access Flexible / On Demand Incomplete results Biased by starting point

Page 4: Splendid: SPARQL Endpoint Federation Exploiting VOID Descriptions

Slide 4WeST InstitutePeople and Knowledge Networks

Olaf GörlitzCOLD 2011, Bonn, Germany

Our Approach

Live data accessFlexible source integrationEffective query planningComplete results

Data Federation

Hypothesis:Efficient query federation is possible using core Semantic Web technology (i.e. SPARQL endpoints, VoiD descriptions)

Page 5: Splendid: SPARQL Endpoint Federation Exploiting VOID Descriptions

Slide 5WeST InstitutePeople and Knowledge Networks

Olaf GörlitzCOLD 2011, Bonn, Germany

VoiD: „Vocabulary of Interlinked Datasets“

}}

}

} General Information

Basic statisticstriples = 732744

Type statisticschebi:Compound = 50477

Predicate statisticsbio:formula = 39555

Page 6: Splendid: SPARQL Endpoint Federation Exploiting VOID Descriptions

Slide 6WeST InstitutePeople and Knowledge Networks

Olaf GörlitzCOLD 2011, Bonn, Germany

Distributed Query Processing

Contribution:Apply Best Practices of RDBMS for RDF Federation

http://code.google.com/p/rdffederator/

Page 7: Splendid: SPARQL Endpoint Federation Exploiting VOID Descriptions

Slide 7WeST InstitutePeople and Knowledge Networks

Olaf GörlitzCOLD 2011, Bonn, Germany

Query Example

SELECT ?drug ?title WHERE {  ?drug drugbank:drugCategory category:micronutrient .  ?drug drugbank:casRegistryNumber ?id .  ?keggDrug rdf:type kegg:Drug .  ?keggDrug bio2rdf:xRef ?id .  ?keggDrug purl:title ?title . }}

Which drugs are categorized as micronutrients?

Page 8: Splendid: SPARQL Endpoint Federation Exploiting VOID Descriptions

Slide 8WeST InstitutePeople and Knowledge Networks

Olaf GörlitzCOLD 2011, Bonn, Germany

Query Processing

Source Selection Join Optimization Query Execution

SELECT ?drug ?title WHERE {  ?drug drugbank:drugCategory category:micronutrient .  ?drug drugbank:casRegistryNumber ?id .  ?keggDrug rdf:type kegg:Drug .  ?keggDrug bio2rdf:xRef ?id .  ?keggDrug purl:title ?title . }}

Page 9: Splendid: SPARQL Endpoint Federation Exploiting VOID Descriptions

Slide 9WeST InstitutePeople and Knowledge Networks

Olaf GörlitzCOLD 2011, Bonn, Germany

Query Processing

Source Selection Join Optimization Query Execution

SELECT ?drug ?title WHERE {  ?drug drugbank:drugCategory category:micronutrient .  ?drug drugbank:casRegistryNumber ?id .  ?keggDrug rdf:type kegg:Drug .  ?keggDrug bio2rdf:xRef ?id .  ?keggDrug purl:title ?title . }}

predicate-indexdrugbank:drugCategory → drugbank

type-indexkegg:Drug → kegg

1. Step: Index-based source mapping

→ drugbank

→ kegg

→ kegg, dbpedia, Chebi

→ drugbank

→ kegg

Page 10: Splendid: SPARQL Endpoint Federation Exploiting VOID Descriptions

Slide 10WeST InstitutePeople and Knowledge Networks

Olaf GörlitzCOLD 2011, Bonn, Germany

Query Processing

SELECT ?drug ?title WHERE {  ?drug drugbank:drugCategory category:micronutrient .  ?drug drugbank:casRegistryNumber ?id .  ?keggDrug rdf:type kegg:Drug .  ?keggDrug bio2rdf:xRef ?id .  ?keggDrug purl:title ?title . }}

No index for subject / object values

2. Step: Refinement with ASK Queries

Source Selection Join Optimization Query Execution

Page 11: Splendid: SPARQL Endpoint Federation Exploiting VOID Descriptions

Slide 11WeST InstitutePeople and Knowledge Networks

Olaf GörlitzCOLD 2011, Bonn, Germany

Query Processing

SELECT ?drug ?title WHERE {  ?drug drugbank:drugCategory category:micronutrient .  ?drug drugbank:casRegistryNumber ?id .  ?keggDrug rdf:type kegg:Drug .  ?keggDrug bio2rdf:xRef ?id .  ?keggDrug purl:title ?title . }}

3. Step: Grouping Triple Patterns

Source Selection Join Optimization Query Execution

}}

drugbank

kegg

} kegg, dbpedia, Chebi

+ grouping sameAs patterns

Page 12: Splendid: SPARQL Endpoint Federation Exploiting VOID Descriptions

Slide 12WeST InstitutePeople and Knowledge Networks

Olaf GörlitzCOLD 2011, Bonn, Germany

Join Order Optimization

Source Selection Join Optimization Query Execution

bind join /hash join

Dynamic Programming with statistics-based cost estimation

Page 13: Splendid: SPARQL Endpoint Federation Exploiting VOID Descriptions

Slide 13WeST InstitutePeople and Knowledge Networks

Olaf GörlitzCOLD 2011, Bonn, Germany

Evaluation

DARQ AliBaba FedX SPLENDID

Statistics ServiceDesc – – VoiD

Source Selection

Statistics(predicates)

All sources ASK queries Statistics + ASK queries

Query Optimization

DynProg Heuristics Heuristics DynProg

Query Execution

Bind join Bind join Bound Join + parallelization

Bind Join + Hash Join

Orthogonal State-of-the-Art approaches:

FedBench Evaluation Suite• Life Science + Cross Domain Data• different query characteristics

Measuring• #data sources selected• query execution time

Page 14: Splendid: SPARQL Endpoint Federation Exploiting VOID Descriptions

Slide 14WeST InstitutePeople and Knowledge Networks

Olaf GörlitzCOLD 2011, Bonn, Germany

Evaluation: Source Selection

Source Selection Join Optimization Query Execution

rdf:typeowl:sameAs

Page 15: Splendid: SPARQL Endpoint Federation Exploiting VOID Descriptions

Slide 15WeST InstitutePeople and Knowledge Networks

Olaf GörlitzCOLD 2011, Bonn, Germany

Evaluation: Query Optimization

Source Selection Join Optimization Query Execution

Page 16: Splendid: SPARQL Endpoint Federation Exploiting VOID Descriptions

Slide 16WeST InstitutePeople and Knowledge Networks

Olaf GörlitzCOLD 2011, Bonn, Germany

Conclusion

VoiD-based query federation is efficient

Publish more VoiD description!

What next? Combination with FedX Improving estimation and cost model Integrating SPARQL 1.1 features