Validating statistical Index Data represented in RDF using SPARQL Queries: Computex
-
Upload
jose-emilio-labra-gayo -
Category
Technology
-
view
6.751 -
download
3
description
Transcript of Validating statistical Index Data represented in RDF using SPARQL Queries: Computex
![Page 1: Validating statistical Index Data represented in RDF using SPARQL Queries: Computex](https://reader035.fdocuments.net/reader035/viewer/2022062513/555067cbb4c905c0448b555f/html5/thumbnails/1.jpg)
Validating statistical index data represented in RDF using
SPARQL queries
WESO Research GroupUniversity of Oviedo, Spain
Jose Emilio Labra Gayo Jose María Álvarez Rodríguez
![Page 2: Validating statistical Index Data represented in RDF using SPARQL Queries: Computex](https://reader035.fdocuments.net/reader035/viewer/2022062513/555067cbb4c905c0448b555f/html5/thumbnails/2.jpg)
Motivation
The WebIndex ProjectMeasure impact of the Web in different countriesFirst publication: 2012, 2 web sites:Visualization
http://thewebindex.org
Data portalhttp://data.webfoundation.org
![Page 3: Validating statistical Index Data represented in RDF using SPARQL Queries: Computex](https://reader035.fdocuments.net/reader035/viewer/2022062513/555067cbb4c905c0448b555f/html5/thumbnails/3.jpg)
WebIndex Workflow
Raw DataExcel
Computed dataExcel
Externalsources
All dataRDF Datastore
Visualizations
![Page 4: Validating statistical Index Data represented in RDF using SPARQL Queries: Computex](https://reader035.fdocuments.net/reader035/viewer/2022062513/555067cbb4c905c0448b555f/html5/thumbnails/4.jpg)
Technical details
Index made from61 countries (100 planned for 2013)85 indicators:
51 Primary (questionnaires)34 Secondary (external sources)
WebIndex RDF dataModeled on top of RDF Data CubeMore than 1 million triplesLinked data: DBPedia, Organizations, etc.Records data computation
![Page 5: Validating statistical Index Data represented in RDF using SPARQL Queries: Computex](https://reader035.fdocuments.net/reader035/viewer/2022062513/555067cbb4c905c0448b555f/html5/thumbnails/5.jpg)
Country 2009 2010 2011
Spain 4 5 3
Finland 4 5 6
Armenia 1 1 1
Chile 6 8 10.6
Country 2009 2010 2011
Spain 4 5 3
Finland 4 5 6
Armenia 1 1 1
Chile 6 8 10.6
Country 2009 2010 2011
Spain 4 5 3
Finland 4 5 6
Armenia 1 1 1
Chile 6 8 10.6
Country 2009 2010 2011
Spain 4 5 3
Finland 4 5 6
Armenia 1 1 1
Chile 6 8 10.6
Country 2009 2010 2011
Spain 4 5 3
Finland 4 6
Armenia 1
Chile 6 8
Country 2009 2010 2011
Spain 4 5 3
Finland 4 6
Armenia 1
Chile 6 8
WebIndex computation process (1)
Simplified with one indicator, 3 years and 4 countries
Raw Data
Country 2009 2010 2011
Spain 4 5 3
Finland 4 5 6
Armenia 1 1 1
Chile 6 8 10.6
Imputed Data
Filtered Data (Indicator A)Country 2009 2010 2011
Spain -0.57 -0.57 -0.92
Finland -0.57 -0.57 -0.14
Chile 1.15 1.15 1.06
Normalized Data (z-scores)
Country 2009 2010 2011
Spain 4 5 3
Finland 4 6
Armenia 1
Chile 6 8
Country 2009 2010 2011
Spain 4 5 3
Finland 4 5 6
Armenia 1 1 1
Chile 6 8 10.6
Country 2009 2010 2011
Spain -0.57 -0.57 -0.92
Finland -0.57 -0.57 -0.14
Chile 1.15 1.15 1.06
Country 2009 2010 2011
Spain -0.57 -0.57 -0.92
Finland -0.57 -0.57 -0.14
Chile 1.15 1.15 1.06
Mean
Average growth
z-score
More details can be found here: http://thewebindex.org/about/methodology/computation/
![Page 6: Validating statistical Index Data represented in RDF using SPARQL Queries: Computex](https://reader035.fdocuments.net/reader035/viewer/2022062513/555067cbb4c905c0448b555f/html5/thumbnails/6.jpg)
WebIndex computation Process (2)
Simplified with one indicator, 3 years and 4 countries
Country 2009 2010 2011
Spain -0.57 -0.57 -0.92
Finland -0.57 -0.57 -0.14
Chile 1.15 1.15 1.06
Normalized Data (z-scores)
Country 2009 2010 2011
Spain -0.57 -0.57 -0.92
Finland -0.57 -0.57 -0.14
Chile 1.15 1.15 1.06
Country 2009 2010 2011
Spain -0.57 -0.57 -0.92
Finland -0.57 -0.57 -0.14
Chile 1.15 1.15 1.06
More details can be found here: http://thewebindex.org/about/methodology/computation/
Adjusted data
Country A B C D ...
Spain 8 7 9.1 7.1 ...
Finland 7 8 7.1 8 ...
Chile 8 9 7.6 6 ...
Group indicators
Country Readiness Impact Web Composite
Spain 5.7 3.5 5.1 4.5
Finland 5.5 3.9 7.1 4.9
Chile 6.7 4.5 7.6 5.1
Rankings
Country Readiness Impact Web Composite
Spain 2 3 3 3
Finland 3 2 2 2
Chile 1 1 1 1
𝑥𝑖=𝑥 𝑖+𝛿
![Page 7: Validating statistical Index Data represented in RDF using SPARQL Queries: Computex](https://reader035.fdocuments.net/reader035/viewer/2022062513/555067cbb4c905c0448b555f/html5/thumbnails/7.jpg)
Example of datarepresentation in RDF
Example:obs:obsM23 a qb:Observation ; cex:computation [ a cex:Z-Score ; cex:observation obs:obsA23 ; cex:slice slice:sliceA09 ; ] ; cex:value -0.57 ; cex:md5-checksum "2917835203..." ; cex:indicator indicator:A ; cex:concept country:ESP ; qb:dataSet dataset:A-Normalized ; # ... other declarations omitted for brevity
It declares that the valueof this observation was obtained asz-score of obs:obsA23 over slice:sliceA09
Each observation follows the RDF Data Cube vocabulary extented with metadata about how it was obtained
![Page 8: Validating statistical Index Data represented in RDF using SPARQL Queries: Computex](https://reader035.fdocuments.net/reader035/viewer/2022062513/555067cbb4c905c0448b555f/html5/thumbnails/8.jpg)
Computex
Vocabulary for statistical computationsExtends RDF Data CubeOntology at: http://purl.org/weso/computex Some terms:
cex:Conceptcex:Indicatorcex:Computationcex:WeightSchemaqb:Observation...
![Page 9: Validating statistical Index Data represented in RDF using SPARQL Queries: Computex](https://reader035.fdocuments.net/reader035/viewer/2022062513/555067cbb4c905c0448b555f/html5/thumbnails/9.jpg)
Validation process
Last year (2012) Template based validationMD5 checksum of each observation and some fields
This year (2013)SPARQL based validation3 levels of validation
RDF Data CubeShapes of dataComputation process
Ultimate goal: automatically compute the index
![Page 10: Validating statistical Index Data represented in RDF using SPARQL Queries: Computex](https://reader035.fdocuments.net/reader035/viewer/2022062513/555067cbb4c905c0448b555f/html5/thumbnails/10.jpg)
Validation approach
SPARQL CONSTRUCT queries instead of ASKIF no error THEN empty model
ELSE RDF graph with error informationCONSTRUCT { [ a cex:Error ; cex:errorParam [cex:name "obs"; cex:value ?obs ] , [cex:name "value1"; cex:value ?value1 ] , [cex:name "value2"; cex:value ?value2 ] ; cex:msg "Observation has two different values" . ]} WHERE { ?obs a qb:Observation . ?obs cex:value ?value1 . ?obs cex:value ?value2 . FILTER ( ?value1 != ?value2 )}
CONSTRUCT queries enable debugging
![Page 11: Validating statistical Index Data represented in RDF using SPARQL Queries: Computex](https://reader035.fdocuments.net/reader035/viewer/2022062513/555067cbb4c905c0448b555f/html5/thumbnails/11.jpg)
ASK WHERE { ?dim a qb:DimensionProperty . FILTER NOT EXISTS { ?dim rdfs:range [] }}
SPARQL queriesRDF Data Cube
We converted RDF Data Cube integrity constraints from ASK to CONSTRUCT queries
CONSTRUCT { [ a cex:Error ; cex:errorParam [cex:name "dim"; cex:value ?dim ] ; cex:msg "Every Dimension must have a declared range" . ]} WHERE { ?dim a qb:DimensionProperty . FILTER NOT EXISTS { ?dim rdfs:range [] }}
![Page 12: Validating statistical Index Data represented in RDF using SPARQL Queries: Computex](https://reader035.fdocuments.net/reader035/viewer/2022062513/555067cbb4c905c0448b555f/html5/thumbnails/12.jpg)
SPARQL expressivity
SPARQL can express complex validation patterns. Example: MeanCONSTRUCT { [ a cex:Error ; cex:errorParam # ...omitted cex:msg "Mean value does not match" ] .} WHERE { ?obs a qb:Observation ; cex:computation ?comp ; cex:value ?val . ?comp a cex:Mean . { SELECT (AVG(?value) as ?mean) ?comp WHERE { ?comp cex:observation ?obs1 . ?obs1 cex:value ?value ; } GROUP BY ?comp } FILTER( abs(?mean - ?val) > 0.0001) }}
![Page 13: Validating statistical Index Data represented in RDF using SPARQL Queries: Computex](https://reader035.fdocuments.net/reader035/viewer/2022062513/555067cbb4c905c0448b555f/html5/thumbnails/13.jpg)
Limitations of SPARQL expressivity
Some built-in functions are not standardizedExample: z-score employs standard deviation.
It requires built-in function: sqrtAvailable in some SPARQL implementations
Ranking of values was not obvious (*)
2 solutions:Using GROUP_CONCATCheck a value against all the other values
Neither solution is efficient
(*) http://stackoverflow.com/questions/17313730/how-to-rank-values-in-sparql
![Page 14: Validating statistical Index Data represented in RDF using SPARQL Queries: Computex](https://reader035.fdocuments.net/reader035/viewer/2022062513/555067cbb4c905c0448b555f/html5/thumbnails/14.jpg)
SPARQL expressivity
Handling series with RDF CollectionsAverage growth:
CONSTRUCT { # ... omitted for brevity } WHERE { ?obs cex:computation [a cex:AverageGrowth; cex:observations ?ls] ; cex:value ?val . ?ls rdf:first [cex:value ?v1 ] . { SELECT ( SUM(?v_n / ?v_n1)/COUNT(*) as ?meanGrowth) WHERE { ?ls rdf:rest* [ rdf:first [ cex:value ?v_n ] ; rdf:rest [ rdf:first [ cex:value ?v_n1 ]]] . } } FILTER (abs(?meanGrowth * ?v1 - ?val) > 0.001) } * This solution was provided by Joshua Taylor
![Page 15: Validating statistical Index Data represented in RDF using SPARQL Queries: Computex](https://reader035.fdocuments.net/reader035/viewer/2022062513/555067cbb4c905c0448b555f/html5/thumbnails/15.jpg)
RDF Profiles
Descriptions (with constraints) of dataset familiesCan be used to validate (and compute) RDF graphsExamples:
Cube (Statistical data)Normalization Update queries + Integrity constraints
Computex (statistical computations)Adds more queries
Other user defined profiles
Cube
Computex
WebIndexCloudIndex
![Page 16: Validating statistical Index Data represented in RDF using SPARQL Queries: Computex](https://reader035.fdocuments.net/reader035/viewer/2022062513/555067cbb4c905c0448b555f/html5/thumbnails/16.jpg)
Cube Profile
W3c Candidate Recommendation (http://www.w3.org/TR/vocab-data-cube/)
1 base ontology + RDF Schema inference2 update queries (normalization algorithm)21 integrity constraint queries
cex:cube a cex:ValidationProfile ; cex:name "RDF Data Cube" ; cex:ontologyBase cube:cube.ttl ; cex:expandSteps ( [ cex:name "Closure" ; cex:uri cube:closure.ru ] [ cex:name "Flatten" ; cex:uri cube:flatten.ru ] ) ; cex:integrityQuery [ cex:name "IC-1. Unique DataSet" ; cex:uri cube:q1.sparql ]; # ... .
![Page 17: Validating statistical Index Data represented in RDF using SPARQL Queries: Computex](https://reader035.fdocuments.net/reader035/viewer/2022062513/555067cbb4c905c0448b555f/html5/thumbnails/17.jpg)
Computex Profile
1 base ontology http://purl.org/weso/computex
Imports RDF Data Cubecex:computex a cex:ValidationProfile ; cex:name "Computex" ; cex:ontologyBase cex:computex.ttl ; cex:import profile:cube.ttl ; cex:expandSteps ( [ cex:name "Copy Raw"; cex:uri cex:copyRaw.sparql ] # Other UPDATE queries ) ; cex:integrityQuery [ cex:name "Adjusted" cex:uri cex:adjusted.sparql ] , # Other integrity queries
![Page 18: Validating statistical Index Data represented in RDF using SPARQL Queries: Computex](https://reader035.fdocuments.net/reader035/viewer/2022062513/555067cbb4c905c0448b555f/html5/thumbnails/18.jpg)
Computex Validation tool
Available at: http://computex.herokuapp.com
FeaturesInformative error messages
Option to generate EARL report
2 validation profiles (Cube and Computex)Source code in Scala available at:
http://github.com/weso/computex
Based on profiles
![Page 19: Validating statistical Index Data represented in RDF using SPARQL Queries: Computex](https://reader035.fdocuments.net/reader035/viewer/2022062513/555067cbb4c905c0448b555f/html5/thumbnails/19.jpg)
Computex Validation Toolhttp://computex.herokuapp.com
![Page 20: Validating statistical Index Data represented in RDF using SPARQL Queries: Computex](https://reader035.fdocuments.net/reader035/viewer/2022062513/555067cbb4c905c0448b555f/html5/thumbnails/20.jpg)
Conclusions
WebIndex = use case for RDF ValidationSPARQL queries as an implementation approachChallenges:
Expressivity limits of SPARQLComplexity of some queries
Concept of RDF Profiles Declarative, Turtle syntaxIntermediate level between OWL and SPARQL
![Page 21: Validating statistical Index Data represented in RDF using SPARQL Queries: Computex](https://reader035.fdocuments.net/reader035/viewer/2022062513/555067cbb4c905c0448b555f/html5/thumbnails/21.jpg)
END OF PRESENTATION
![Page 22: Validating statistical Index Data represented in RDF using SPARQL Queries: Computex](https://reader035.fdocuments.net/reader035/viewer/2022062513/555067cbb4c905c0448b555f/html5/thumbnails/22.jpg)
COMPLEMENTARY SLIDES
![Page 23: Validating statistical Index Data represented in RDF using SPARQL Queries: Computex](https://reader035.fdocuments.net/reader035/viewer/2022062513/555067cbb4c905c0448b555f/html5/thumbnails/23.jpg)
Limits of SPARQL expressivity
Ranking of values using GROUP_CONCAT
SELECT ?x ?v ?ranking { ?x :value ?v . { SELECT (GROUP_CONCAT(?x;separator="") as ?ordered) { { SELECT ?x { ?x :value ?v . } ORDER BY DESC(?v) } }}BIND (str(?x) as ?xName)BIND (strbefore(?ordered,?xName) as ?before)BIND ((strlen(?before) / strlen(?xName)) + 1 as ?ranking)} ORDER BY ?ranking
![Page 24: Validating statistical Index Data represented in RDF using SPARQL Queries: Computex](https://reader035.fdocuments.net/reader035/viewer/2022062513/555067cbb4c905c0448b555f/html5/thumbnails/24.jpg)
Limits of SPARQL expressivity
Ranking of values comparing all values
SELECT ?x ?v (COUNT(*) as ?ranking) WHERE { ?x :value ?v . [] :value ?u . FILTER( ?v <= ?u )}GROUP BY ?x ?vORDER BY ?ranking
* This solution was provided by Joshua Taylor