SSN-TC workshop talk at ISWC 2015 on Emrooz
-
Upload
markus-stocker -
Category
Software
-
view
304 -
download
1
Transcript of SSN-TC workshop talk at ISWC 2015 on Emrooz
![Page 1: SSN-TC workshop talk at ISWC 2015 on Emrooz](https://reader034.fdocuments.net/reader034/viewer/2022052705/58f08b091a28ab74458b45ff/html5/thumbnails/1.jpg)
First Joint International Workshop onSemantic Sensor Networks and Terra CognitaOctober 11, 2015, Bethlehem, PA, USA
Emrooz: A Scalable Database forSSN Observations
Markus Stocker, Narasinha Shurpali, Kerry Taylor, GeorgeBurba, Mauno Rönkkö, Mikko Kolehmainen
[email protected]@markusstocker and @envinf
![Page 2: SSN-TC workshop talk at ISWC 2015 on Emrooz](https://reader034.fdocuments.net/reader034/viewer/2022052705/58f08b091a28ab74458b45ff/html5/thumbnails/2.jpg)
2
IntroductionI Expressive ontologies for sensor (meta-) data (SSN)I Flexible graph data model (RDF)I Triple stores obvious choiceI Unfortunately hardly viable at scaleI Triple stores indexes for graph pattern queriesI Not designed for time series interval queries
![Page 3: SSN-TC workshop talk at ISWC 2015 on Emrooz](https://reader034.fdocuments.net/reader034/viewer/2022052705/58f08b091a28ab74458b45ff/html5/thumbnails/3.jpg)
3
AimI Build a database that ...I Consumes SSN observations in RDFI Evaluates SSN observation SPARQL queriesI Scales to billions of observationsI Has better query performance than triple stores
![Page 4: SSN-TC workshop talk at ISWC 2015 on Emrooz](https://reader034.fdocuments.net/reader034/viewer/2022052705/58f08b091a28ab74458b45ff/html5/thumbnails/4.jpg)
4
Architecture
![Page 5: SSN-TC workshop talk at ISWC 2015 on Emrooz](https://reader034.fdocuments.net/reader034/viewer/2022052705/58f08b091a28ab74458b45ff/html5/thumbnails/5.jpg)
5
Cassandra data modelI Schema consisting of
I Partition key (row key) of type asciiI Clustering key (column name) of type timeuuidI Column value of type blob
I The partition key consists of two (dash-concatenated) partsI SHA-256 hex string digest of sensor-property-feature URIsI Date time string of pattern yyyyMMddHHmm
I Computed from observation result timeI Floor-rounded to year, month, day, hour, or minuteI Rounding depends on sensor sampling frequencyI Goal is to limit the number of columns per row
I Clustering key determined by observation result timeI Column value is set of triples for observation (binary)
![Page 6: SSN-TC workshop talk at ISWC 2015 on Emrooz](https://reader034.fdocuments.net/reader034/viewer/2022052705/58f08b091a28ab74458b45ff/html5/thumbnails/6.jpg)
6
Experiments
I LI-7500A Open Path CO2/H2O Gas AnalyzerI LI-7700 Open Path CH4 AnalyzerI Property of mole fractionI Three features for the monitored gases
![Page 7: SSN-TC workshop talk at ISWC 2015 on Emrooz](https://reader034.fdocuments.net/reader034/viewer/2022052705/58f08b091a28ab74458b45ff/html5/thumbnails/7.jpg)
![Page 8: SSN-TC workshop talk at ISWC 2015 on Emrooz](https://reader034.fdocuments.net/reader034/viewer/2022052705/58f08b091a28ab74458b45ff/html5/thumbnails/8.jpg)
8
Experiments
I January 7 to May 26, 2015, 6045 GHG archive filesI Estimated # of sensor observations is 326 430 000I Estimated # of triples is 4.9 billion (15 triples / observation)I Load and query performance on 10 subsetsI SPARQL query with 10 min intervalI Compared to Stardog and BlazegraphI Test performance with varying time interval
![Page 9: SSN-TC workshop talk at ISWC 2015 on Emrooz](https://reader034.fdocuments.net/reader034/viewer/2022052705/58f08b091a28ab74458b45ff/html5/thumbnails/9.jpg)
9
The query
select ?time ?valuewhere { [
ssn:observedBy licor:LERS-75H-2035 ;ssn:observedProperty sweet-propFraction:MoleFraction ;ssn:featureOfInterest sweet-matrCompound:CO2 ;ssn:observationResultTime [ time:inXSDDateTime ?time ] ;ssn:observationResult [ ssn:hasValue [
dul:hasRegionDataValue ?value] ]
]filter (?time >= "2015-04-15T00:00:00.000+06:00"^^xsd:dateTime
&& ?time < "2015-04-15T00:10:00.000+06:00"^^xsd:dateTime)}order by asc(?time)
![Page 10: SSN-TC workshop talk at ISWC 2015 on Emrooz](https://reader034.fdocuments.net/reader034/viewer/2022052705/58f08b091a28ab74458b45ff/html5/thumbnails/10.jpg)
10
Results: Some figures
Subset Observations Triples Distinct30 m 54 000 810 000 648 0071 h 108 000 1 620 000 1 296 0073 h 324 000 4 860 000 3 888 0076 h 647 997 9 719 955 7 775 97112 h 1 295 997 19 439 955 15 551 9711 d 2 591 994 38 879 910 31 103 9357 d 18 140 271 272 104 065 217 683 2591 M 72 526 464 1 087 896 960 870 317 5753 M 194 188 107 2 912 821 605 *J-M 328 715 445 4 930 731 675 *
![Page 11: SSN-TC workshop talk at ISWC 2015 on Emrooz](https://reader034.fdocuments.net/reader034/viewer/2022052705/58f08b091a28ab74458b45ff/html5/thumbnails/11.jpg)
11
Results: Load performance
10
100
1000
10000
100000
1000000
30 m 1 h 3 h 6 h 12 h 1 d 7 d 1 m 3 m J-M
Tim
e (lo
g sc
ale)
[s]
Subsets
EmroozBlazegraph
Stardog
![Page 12: SSN-TC workshop talk at ISWC 2015 on Emrooz](https://reader034.fdocuments.net/reader034/viewer/2022052705/58f08b091a28ab74458b45ff/html5/thumbnails/12.jpg)
12
Results: Query performance
10
100
1000
30 m 1 h 3 h 6 h 12 h 1 d 7 d 1 m 3 m J-M
Tim
e (lo
g sc
ale)
[s]
Subsets
EmroozBlazegraph
Stardog
![Page 13: SSN-TC workshop talk at ISWC 2015 on Emrooz](https://reader034.fdocuments.net/reader034/viewer/2022052705/58f08b091a28ab74458b45ff/html5/thumbnails/13.jpg)
13
Results: Query size performance
0
1
2
3
4
5
6
7
8
9
10
1 s 30 s 1 m 5 m 10 m 20 m 30 m 40 m 50 m 60 m
Tim
e [s
]
Query time interval
Emrooz
![Page 14: SSN-TC workshop talk at ISWC 2015 on Emrooz](https://reader034.fdocuments.net/reader034/viewer/2022052705/58f08b091a28ab74458b45ff/html5/thumbnails/14.jpg)
14
REST
curl http://localhost:8080/sensors/listcurl http://localhost:8080/properties/listcurl http://localhost:8080/features/list
curl -H "Accept: application/json" \http://localhost:8080/sensors/list
curl -H "Accept: text/csv" -G \--data-urlencode sensor=http://example.org#thermometer \--data-urlencode property=http://example.org#temperature \--data-urlencode feature=http://example.org#air \--data-urlencode from=2015-04-21T01:00:00.000+03:00 \--data-urlencode to=2015-04-21T02:00:00.000+03:00 \http://localhost:8080/observations/sensor/list
![Page 15: SSN-TC workshop talk at ISWC 2015 on Emrooz](https://reader034.fdocuments.net/reader034/viewer/2022052705/58f08b091a28ab74458b45ff/html5/thumbnails/15.jpg)
15
R
host <- "http://localhost:8080"
df.sensors <- read.csv(text=getURL(paste0(host, "/sensors/list")),header=FALSE, col.names=c("sensor"))
df.sensorssensor
1 http://licor.com#LERS-75H-CH42 http://licor.com#LERS-75H-CO2
![Page 16: SSN-TC workshop talk at ISWC 2015 on Emrooz](https://reader034.fdocuments.net/reader034/viewer/2022052705/58f08b091a28ab74458b45ff/html5/thumbnails/16.jpg)
16
Rhost <- "http://localhost:8080"sensor <- "http://licor.com#LERS-75H-CO2"property <- "http://sweet.jpl.nasa.gov/2.3/propMass.owl#Density"feature <- "http://sweet.jpl.nasa.gov/2.3/matrCompound.owl#CarbonDioxide"from <- "2015-01-07T00:00:00.000+06:00"to <- "2015-01-07T00:01:00.000+06:00"
url <- paste0(host, "/observations/sensor/list?","sensor=", curlEscape(sensor),"&property=", curlEscape(property),"&feature=", curlEscape(feature),"&from=", curlEscape(from),"&to=", curlEscape(to))
df.observations <- read.csv(text=getURL(url,httpheader=c(Accept="text/csv")), header=TRUE, sep=",")
ggplot(data=df.observations, aes(time, value))+ geom_line() + xlab("Time") + ylab("CO2 [mmol m-3]")
![Page 17: SSN-TC workshop talk at ISWC 2015 on Emrooz](https://reader034.fdocuments.net/reader034/viewer/2022052705/58f08b091a28ab74458b45ff/html5/thumbnails/17.jpg)
17
R
18.00
18.05
18.10
18.15
00 15 30 45 00
Time
CO
2 [m
mol m
−3]
![Page 18: SSN-TC workshop talk at ISWC 2015 on Emrooz](https://reader034.fdocuments.net/reader034/viewer/2022052705/58f08b091a28ab74458b45ff/html5/thumbnails/18.jpg)
18
Related and future workI Other authors have pointed out the problemI “Semantification of measurement data not promising”I RDF databases on NoSQL systems (e.g. Cumulus RDF)I Support for QB observations (done)I REST API (preliminary)I Integration with R/Matlab (preliminary)I Performance comparison with other systems
![Page 19: SSN-TC workshop talk at ISWC 2015 on Emrooz](https://reader034.fdocuments.net/reader034/viewer/2022052705/58f08b091a28ab74458b45ff/html5/thumbnails/19.jpg)
19
ConclusionI SSN and RDF nice for sensor (meta-) dataI Triple stores inadequate for observation dataI Alternative approaches requiredI What are the advantages and disadvantages?I Reasoning on all data by some sensor?I Query for observation values exceeding threshold?