Splendid: SPARQL Endpoint Federation Exploiting VOID Descriptions
-
Upload
olafgoerlitz -
Category
Education
-
view
3.038 -
download
2
Transcript of Splendid: SPARQL Endpoint Federation Exploiting VOID Descriptions
Institute for Web Science and Technologies
University of Koblenz ▪ Landau, Germany
SPLENDID: SPARQL Endpoint Federation
Exploiting VOID Descriptions
Olaf Görlitz, Steffen Staab
Slide 2WeST InstitutePeople and Knowledge Networks
Olaf GörlitzCOLD 2011, Bonn, Germany
Motivation
How to access a large number of linked data sources?
Slide 3WeST InstitutePeople and Knowledge Networks
Olaf GörlitzCOLD 2011, Bonn, Germany
Data Integration Approaches
Data Warehouse
Efficient query execution Complete results Data copies Inflexible
Link Traversal
Live Data Access Flexible / On Demand Incomplete results Biased by starting point
Slide 4WeST InstitutePeople and Knowledge Networks
Olaf GörlitzCOLD 2011, Bonn, Germany
Our Approach
Live data accessFlexible source integrationEffective query planningComplete results
Data Federation
Hypothesis:Efficient query federation is possible using core Semantic Web technology (i.e. SPARQL endpoints, VoiD descriptions)
Slide 5WeST InstitutePeople and Knowledge Networks
Olaf GörlitzCOLD 2011, Bonn, Germany
VoiD: „Vocabulary of Interlinked Datasets“
}}
}
} General Information
Basic statisticstriples = 732744
Type statisticschebi:Compound = 50477
Predicate statisticsbio:formula = 39555
Slide 6WeST InstitutePeople and Knowledge Networks
Olaf GörlitzCOLD 2011, Bonn, Germany
Distributed Query Processing
Contribution:Apply Best Practices of RDBMS for RDF Federation
http://code.google.com/p/rdffederator/
Slide 7WeST InstitutePeople and Knowledge Networks
Olaf GörlitzCOLD 2011, Bonn, Germany
Query Example
SELECT ?drug ?title WHERE { ?drug drugbank:drugCategory category:micronutrient . ?drug drugbank:casRegistryNumber ?id . ?keggDrug rdf:type kegg:Drug . ?keggDrug bio2rdf:xRef ?id . ?keggDrug purl:title ?title . }}
Which drugs are categorized as micronutrients?
Slide 8WeST InstitutePeople and Knowledge Networks
Olaf GörlitzCOLD 2011, Bonn, Germany
Query Processing
Source Selection Join Optimization Query Execution
SELECT ?drug ?title WHERE { ?drug drugbank:drugCategory category:micronutrient . ?drug drugbank:casRegistryNumber ?id . ?keggDrug rdf:type kegg:Drug . ?keggDrug bio2rdf:xRef ?id . ?keggDrug purl:title ?title . }}
Slide 9WeST InstitutePeople and Knowledge Networks
Olaf GörlitzCOLD 2011, Bonn, Germany
Query Processing
Source Selection Join Optimization Query Execution
SELECT ?drug ?title WHERE { ?drug drugbank:drugCategory category:micronutrient . ?drug drugbank:casRegistryNumber ?id . ?keggDrug rdf:type kegg:Drug . ?keggDrug bio2rdf:xRef ?id . ?keggDrug purl:title ?title . }}
predicate-indexdrugbank:drugCategory → drugbank
type-indexkegg:Drug → kegg
1. Step: Index-based source mapping
→ drugbank
→ kegg
→ kegg, dbpedia, Chebi
→ drugbank
→ kegg
Slide 10WeST InstitutePeople and Knowledge Networks
Olaf GörlitzCOLD 2011, Bonn, Germany
Query Processing
SELECT ?drug ?title WHERE { ?drug drugbank:drugCategory category:micronutrient . ?drug drugbank:casRegistryNumber ?id . ?keggDrug rdf:type kegg:Drug . ?keggDrug bio2rdf:xRef ?id . ?keggDrug purl:title ?title . }}
No index for subject / object values
2. Step: Refinement with ASK Queries
Source Selection Join Optimization Query Execution
Slide 11WeST InstitutePeople and Knowledge Networks
Olaf GörlitzCOLD 2011, Bonn, Germany
Query Processing
SELECT ?drug ?title WHERE { ?drug drugbank:drugCategory category:micronutrient . ?drug drugbank:casRegistryNumber ?id . ?keggDrug rdf:type kegg:Drug . ?keggDrug bio2rdf:xRef ?id . ?keggDrug purl:title ?title . }}
3. Step: Grouping Triple Patterns
Source Selection Join Optimization Query Execution
}}
drugbank
kegg
} kegg, dbpedia, Chebi
+ grouping sameAs patterns
Slide 12WeST InstitutePeople and Knowledge Networks
Olaf GörlitzCOLD 2011, Bonn, Germany
Join Order Optimization
Source Selection Join Optimization Query Execution
bind join /hash join
Dynamic Programming with statistics-based cost estimation
Slide 13WeST InstitutePeople and Knowledge Networks
Olaf GörlitzCOLD 2011, Bonn, Germany
Evaluation
DARQ AliBaba FedX SPLENDID
Statistics ServiceDesc – – VoiD
Source Selection
Statistics(predicates)
All sources ASK queries Statistics + ASK queries
Query Optimization
DynProg Heuristics Heuristics DynProg
Query Execution
Bind join Bind join Bound Join + parallelization
Bind Join + Hash Join
Orthogonal State-of-the-Art approaches:
FedBench Evaluation Suite• Life Science + Cross Domain Data• different query characteristics
Measuring• #data sources selected• query execution time
Slide 14WeST InstitutePeople and Knowledge Networks
Olaf GörlitzCOLD 2011, Bonn, Germany
Evaluation: Source Selection
Source Selection Join Optimization Query Execution
rdf:typeowl:sameAs
Slide 15WeST InstitutePeople and Knowledge Networks
Olaf GörlitzCOLD 2011, Bonn, Germany
Evaluation: Query Optimization
Source Selection Join Optimization Query Execution
Slide 16WeST InstitutePeople and Knowledge Networks
Olaf GörlitzCOLD 2011, Bonn, Germany
Conclusion
VoiD-based query federation is efficient
Publish more VoiD description!
What next? Combination with FedX Improving estimation and cost model Integrating SPARQL 1.1 features