Efficient RDF Interchange (ERI) Format for RDF Data Streams

68
Efficient RDF Interchange (ERI) Format for RDF Data Streams Javier D. Fernández, Alejandro Llaves, Oscar Corcho Ontology Engineering Group (OEG) Universidad Politécnica de Madrid, Spain

description

Presentation done* at the 13th International Semantic Web Conference (ISWC) in which we approach a compressed format to represent RDF Data Streams. See the original article at: http://dataweb.infor.uva.es/wp-content/uploads/2014/07/iswc14.pdf * Presented by Alejandro Llaves (http://www.slideshare.net/allaves)

Transcript of Efficient RDF Interchange (ERI) Format for RDF Data Streams

Page 1: Efficient RDF Interchange (ERI) Format for RDF Data Streams

Efficient RDF Interchange (ERI) Format for RDF Data Streams

Javier D. Fernández, Alejandro Llaves, Oscar Corcho

Ontology Engineering Group (OEG)Universidad Politécnica de Madrid, Spain

Page 2: Efficient RDF Interchange (ERI) Format for RDF Data Streams

Outline

Index

1. Introduction & Motivation2. Background3. Efficient RDF Interchange (ERI) Format

i. Basic Conceptsii. ERI Streamsiii. Practical Deployment

4. Evaluation5. Conclusions and Next steps

2

Page 3: Efficient RDF Interchange (ERI) Format for RDF Data Streams

3

INTRODUCTION - Static data versus RDF data streams

Page 4: Efficient RDF Interchange (ERI) Format for RDF Data Streams

3

INTRODUCTION - Static data versus RDF data streams

Transform LoadExtract

Files

DBMS

Spatial Information

Web APIs

Linked Data discovery

Page 5: Efficient RDF Interchange (ERI) Format for RDF Data Streams

3

INTRODUCTION - Static data versus RDF data streams

My datasetAugust 2014

My d

Febru

Transform LoadExtract

Files

DBMS

Spatial Information

Web APIs

Linked Data discovery

Page 6: Efficient RDF Interchange (ERI) Format for RDF Data Streams

3

INTRODUCTION - Static data versus RDF data streams

My datasetAugust 2014

My d

Febru

Transform LoadExtract

Files

DBMS

Spatial Information

Web APIs

Linked Data discovery

“Most semantic tools are focused on this static view”

Page 7: Efficient RDF Interchange (ERI) Format for RDF Data Streams

3

INTRODUCTION - Static data versus RDF data streams

RDF Data Streams are gaining momentum, generated from any type of data stream, and combining real-time and historical data.

©Wilgengebroed on Flickr, Mr3641, ProtoplasmaKid and ISA Internationales Stadtbauatelier in commons wikimedia

My datasetAugust 2014

My d

Febru

Page 8: Efficient RDF Interchange (ERI) Format for RDF Data Streams

3

INTRODUCTION - Static data versus RDF data streams

Page 9: Efficient RDF Interchange (ERI) Format for RDF Data Streams

3

INTRODUCTION - Static data versus RDF data streams

Page 10: Efficient RDF Interchange (ERI) Format for RDF Data Streams

3

INTRODUCTION - Static data versus RDF data streams

Page 11: Efficient RDF Interchange (ERI) Format for RDF Data Streams

3

INTRODUCTION - Static data versus RDF data streams

Page 12: Efficient RDF Interchange (ERI) Format for RDF Data Streams

3

INTRODUCTION - Static data versus RDF data streams

RDF streams: potentially unbounded sequences of timestamped RDF statements or graphs.

Page 13: Efficient RDF Interchange (ERI) Format for RDF Data Streams

3

INTRODUCTION - Static data versus RDF data streams

RDF streams: potentially unbounded sequences of timestamped RDF statements or graphs.

user1_observation [t1]weather1_observation [t1] user2_observation [t3]…

Page 14: Efficient RDF Interchange (ERI) Format for RDF Data Streams

3

INTRODUCTION - Static data versus RDF data streams

RDF streams: potentially unbounded sequences of timestamped RDF statements or graphs.

t

u1 u2 u3 u4

w1 w2 w3

Stream

user1_observation [t1]weather1_observation [t1] user2_observation [t3]…

Page 15: Efficient RDF Interchange (ERI) Format for RDF Data Streams

3

INTRODUCTION - Motivation

Achieve efficient transmission of RDF streams, a necessary step to ensure higher throughput for RDF Stream processors

Stream source

Stream source

Stream source

Stream source

Stream Processor

Engine

Historic Information

C-SPARQL, SPARQLStr

eammorph-streamsCQELS Cloud

Ztreamy…

Stream source

queries

Continuous results

Page 16: Efficient RDF Interchange (ERI) Format for RDF Data Streams

INTRODUCTION – Motivation - Requirements

16

Efficient transmission of RDF streams:• Streamable

• Scalable

• Easy (fast) to process (create and parse)

• Compact

• Parametrizable (several tradeoffs compression/time)

Page 17: Efficient RDF Interchange (ERI) Format for RDF Data Streams

BACKGROUND

17

Plain:Turtle/Trig/JSON-LD

Plain+Compression (e.g. gzip) HDT

Streaming HDT RDSZ

RDF/XML + EXI ERI

Streamable Yes Yes No Yes Yes Yes Yes

Scalable Limited Yes Yes No Yes Yes YesEasy (fast) to create and parse Yes Limited Limited Yes Limited Limited Yes

Compact No Yes Yes Limited Yes Yes YesParametrizable: compression/time No Limited Yes No Limited Limited Yes

Page 18: Efficient RDF Interchange (ERI) Format for RDF Data Streams

Outline

Index

1. Introduction & Motivation2. Background3. Efficient RDF Interchange (ERI) Format

i. Basic Conceptsii. ERI Streamsiii. Practical Deployment

4. Evaluation5. Conclusions and Next steps

18

Page 19: Efficient RDF Interchange (ERI) Format for RDF Data Streams

EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Basic Concepts

19

• (Assumption) Most RDF streams are well structured• the is well-known by the data provider• the number of in the structure are limited

     

structurevariations

Page 20: Efficient RDF Interchange (ERI) Format for RDF Data Streams

EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Basic Concepts

20

• (Assumption) Most RDF streams are well structured• the is well-known by the data provider• the number of in the structure are limited

• Efficient RDF Interchange (ERI) Format encodes the information at two levels:

   

structurevariations

Page 21: Efficient RDF Interchange (ERI) Format for RDF Data Streams

EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Basic Concepts

21

• (Assumption) Most RDF streams are well structured• the is well-known by the data provider• the number of in the structure are limited

• Efficient RDF Interchange (ERI) Format encodes the information at two levels:

• A sliding dictionary of structures: Structural Dictionary 

structurevariations

Page 22: Efficient RDF Interchange (ERI) Format for RDF Data Streams

EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Basic Concepts

22

• (Assumption) Most RDF streams are well structured• the is well-known by the data provider• the number of in the structure are limited

• Efficient RDF Interchange (ERI) Format encodes the information at two levels:

• A sliding dictionary of structures: Structural Dictionary• The concrete value for each predicate

structurevariations

Page 23: Efficient RDF Interchange (ERI) Format for RDF Data Streams

EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Basic Concepts

23

• (Assumption) Most RDF streams are well structured• the is well-known by the data provider• the number of in the structure are limited

• Efficient RDF Interchange (ERI) Format encodes the information at two levels:

• A sliding dictionary of structures: Structural Dictionary• The concrete value for each predicate

structurevariations

Page 24: Efficient RDF Interchange (ERI) Format for RDF Data Streams

EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Basic Concepts

24

• (Assumption) Most RDF streams are well structured• the is well-known by the data provider• the number of in the structure are limited

• Efficient RDF Interchange (ERI) Format encodes the information at two levels:

• A sliding dictionary of structures: Structural Dictionary• The concrete value for each predicate

structurevariations

Page 25: Efficient RDF Interchange (ERI) Format for RDF Data Streams

EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Basic Concepts

25

t

u1 u2 u3 u4

w1 w2 w3

Stream

“7.7”^^xsd:float “9.4”^^xsd:float

Structural

Dictionary

temperature

Casual user

Anual pass

wind

ID-30

ID-31 ID-32

ID-33

weather: TemperatureObservation

rdf:type

weather:AirTemperature

ssn:observedProperty

???

ex:CelsiusValue

… …

Page 26: Efficient RDF Interchange (ERI) Format for RDF Data Streams

EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Basic Concepts

26

t

u1 u2 u3 u4

w1 w2 w3

Stream

“7.7”^^xsd:float “9.4”^^xsd:float

Structural

Dictionary

temperature

Casual user

Anual pass

wind

ID-30

ID-31 ID-32

ID-33

weather: TemperatureObservation

rdf:type

weather:AirTemperature

ssn:observedProperty

???

ex:CelsiusValue

… …

molecule

Page 27: Efficient RDF Interchange (ERI) Format for RDF Data Streams

EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Basic Concepts

27

t

u1 u2 u3 u4

w1 w2 w3

Stream

“7.7”^^xsd:float “9.4”^^xsd:float

Structural

Dictionary

temperature

Casual user

Anual pass

wind

ID-30

ID-31 ID-32

ID-33

weather: TemperatureObservation

rdf:type

weather:AirTemperature

ssn:observedProperty

???

ex:CelsiusValue

… …

molecule

Page 28: Efficient RDF Interchange (ERI) Format for RDF Data Streams

EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Basic Concepts

28

t

u1 u2 u3 u4

w1 w2 w3

Stream

“7.7”^^xsd:float “9.4”^^xsd:float

Structural

Dictionary

temperature

Casual user

Anual pass

wind

ID-30

ID-31 ID-32

ID-33

weather: TemperatureObservation

rdf:type

weather:AirTemperature

ssn:observedProperty

???

ex:CelsiusValue

… …

molecule

Page 29: Efficient RDF Interchange (ERI) Format for RDF Data Streams

EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Basic Concepts

29

t

u1 u2 u3 u4

w1 w2 w3

Stream

“7.7”^^xsd:float “9.4”^^xsd:float

Structural

Dictionary

temperature

Casual user

Anual pass

wind

ID-30

ID-31 ID-32

ID-33

weather: TemperatureObservation

rdf:type

weather:AirTemperature

ssn:observedProperty

???

ex:CelsiusValue

… …

molecule

Page 30: Efficient RDF Interchange (ERI) Format for RDF Data Streams

EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Basic Concepts

30

t

u1 u2 u3 u4

w1 w2 w3

Stream

“7.7”^^xsd:float “9.4”^^xsd:float

Structural

Dictionary

temperature

Casual user

Anual pass

wind

ID-30

ID-31 ID-32

ID-33

weather: TemperatureObservation

rdf:type

weather:AirTemperature

ssn:observedProperty

???

ex:CelsiusValue

… …

molecule

Page 31: Efficient RDF Interchange (ERI) Format for RDF Data Streams

EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Basic Concepts

31

t

u1 u2 u3 u4

w1 w2 w3

Stream

“7.7”^^xsd:float “9.4”^^xsd:float

Structural

Dictionary

temperature

Casual user

Anual pass

wind

ID-30

ID-31 ID-32

ID-33

weather: TemperatureObservation

rdf:type

weather:AirTemperature

ssn:observedProperty

???

ex:CelsiusValue

… …

molecule

Page 32: Efficient RDF Interchange (ERI) Format for RDF Data Streams

EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Basic Concepts

32

t

u1 u2 u3 u4

w1 w2 w3

Stream

“7.7”^^xsd:float “9.4”^^xsd:float

Structural

Dictionary

temperature

Casual user

Anual pass

wind

ID-30

ID-31 ID-32

ID-33

weather: TemperatureObservation

rdf:type

weather:AirTemperature

ssn:observedProperty

???

ex:CelsiusValue

… …

molecule

Page 33: Efficient RDF Interchange (ERI) Format for RDF Data Streams

EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Basic Concepts

33

• ERI processing model

• Minimal Information Unit is a molecule:

• We initially restrict to subject molecules

Page 34: Efficient RDF Interchange (ERI) Format for RDF Data Streams

sens-obs:Observation_AirTemperature_4UT01_2003_3_31_6_55_00 a weather:TemperatureObservation ; rdfs: label “Air temperature at 6:55:00”, “Verified” ; om-owl:observedProperty weather:_AirTemperature ; om-owl:procedure sens-obs:System_4UT01 ; om-owl:result sens-obs:MeasureData_AirTemperature_4UT01_2003_3_31_6_55_00 ; om-owl:samplingTime sens-obs:Instant_2003_3_31_6_55_00. ex:CelsiusValue “7.7”^^xsd:float

sens-obs:Observation_AirTemperature_4UT01_2003_3_31_7_45_00 a weather:TemperatureObservation ; rdfs: label “Air temperature at 7:45:00”, “Not Verified” ; om-owl:observedProperty weather:_AirTemperature ; om-owl:procedure sens-obs:System_4UT01 ; om-owl:result sens-obs:MeasureData_AirTemperature_4UT01_2003_3_31_7_45_00 ; om-owl:samplingTime sens-obs:Instant_2003_3_31_7_45_00 . ex:CelsiusValue “9.4”^^xsd:float

EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Basic Concepts

34

Sub

ject

M

ole

cule

Suu

bje

ctM

ole

cule

Page 35: Efficient RDF Interchange (ERI) Format for RDF Data Streams

sens-obs:Observation_AirTemperature_4UT01_2003_3_31_6_55_00 a weather:TemperatureObservation ; rdfs: label “Air temperature at 6:55:00”, “Verified” ; om-owl:observedProperty weather:_AirTemperature ; om-owl:procedure sens-obs:System_4UT01 ; om-owl:result sens-obs:MeasureData_AirTemperature_4UT01_2003_3_31_6_55_00 ; om-owl:samplingTime sens-obs:Instant_2003_3_31_6_55_00. ex:CelsiusValue “7.7”^^xsd:float

sens-obs:Observation_AirTemperature_4UT01_2003_3_31_7_45_00 a weather:TemperatureObservation ; rdfs: label “Air temperature at 7:45:00”, “Not Verified” ; om-owl:observedProperty weather:_AirTemperature ; om-owl:procedure sens-obs:System_4UT01 ; om-owl:result sens-obs:MeasureData_AirTemperature_4UT01_2003_3_31_7_45_00 ; om-owl:samplingTime sens-obs:Instant_2003_3_31_7_45_00 . ex:CelsiusValue “9.4”^^xsd:float

EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Basic Concepts

35

Sub

ject

M

ole

cule

…..Structure ID30= a (1, weather:TemperatureObservation) rdfs:label (2) om-wl:observedProperty (1, weather:_AirTemperature ) om-owl:procedure (1,sens-obs:System_4UT01) om-owl:result (1) om-owl:samplingTime (1) ex:CelsiusValue (1) …..

Structural Dictionary

Suu

bje

ctM

ole

cule

Page 36: Efficient RDF Interchange (ERI) Format for RDF Data Streams

sens-obs:Observation_AirTemperature_4UT01_2003_3_31_6_55_00 a weather:TemperatureObservation ; rdfs: label “Air temperature at 6:55:00”, “Verified” ; om-owl:observedProperty weather:_AirTemperature ; om-owl:procedure sens-obs:System_4UT01 ; om-owl:result sens-obs:MeasureData_AirTemperature_4UT01_2003_3_31_6_55_00 ; om-owl:samplingTime sens-obs:Instant_2003_3_31_6_55_00. ex:CelsiusValue “7.7”^^xsd:float

sens-obs:Observation_AirTemperature_4UT01_2003_3_31_7_45_00 a weather:TemperatureObservation ; rdfs: label “Air temperature at 7:45:00”, “Not Verified” ; om-owl:observedProperty weather:_AirTemperature ; om-owl:procedure sens-obs:System_4UT01 ; om-owl:result sens-obs:MeasureData_AirTemperature_4UT01_2003_3_31_7_45_00 ; om-owl:samplingTime sens-obs:Instant_2003_3_31_7_45_00 . ex:CelsiusValue “9.4”^^xsd:float

EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Basic Concepts

36

Sub

ject

M

ole

cule

…..Structure ID30= a (1, weather:TemperatureObservation) rdfs:label (2) om-wl:observedProperty (1, weather:_AirTemperature ) om-owl:procedure (1,sens-obs:System_4UT01) om-owl:result (1) om-owl:samplingTime (1) ex:CelsiusValue (1) …..

Structural Dictionary

Suu

bje

ctM

ole

cule

Air Temperature Observations of the

Sensor “System_4UT01”

Page 37: Efficient RDF Interchange (ERI) Format for RDF Data Streams

sens-obs:Observation_AirTemperature_4UT01_2003_3_31_6_55_00 a weather:TemperatureObservation ; rdfs: label “Air temperature at 6:55:00”, “Verified” ; om-owl:observedProperty weather:_AirTemperature ; om-owl:procedure sens-obs:System_4UT01 ; om-owl:result sens-obs:MeasureData_AirTemperature_4UT01_2003_3_31_6_55_00 ; om-owl:samplingTime sens-obs:Instant_2003_3_31_6_55_00. ex:CelsiusValue “7.7”^^xsd:float

sens-obs:Observation_AirTemperature_4UT01_2003_3_31_7_45_00 a weather:TemperatureObservation ; rdfs: label “Air temperature at 7:45:00”, “Not Verified” ; om-owl:observedProperty weather:_AirTemperature ; om-owl:procedure sens-obs:System_4UT01 ; om-owl:result sens-obs:MeasureData_AirTemperature_4UT01_2003_3_31_7_45_00 ; om-owl:samplingTime sens-obs:Instant_2003_3_31_7_45_00 . ex:CelsiusValue “9.4”^^xsd:float

sens-obs:Observation_AirTemperature_4UT01_2003_3_31_6_55_00 a weather:TemperatureObservation ; rdfs: label “Air temperature at 6:55:00”, “Verified” ; om-owl:observedProperty weather:_AirTemperature ; om-owl:procedure sens-obs:System_4UT01 ; om-owl:result sens-obs:MeasureData_AirTemperature_4UT01_2003_3_31_6_55_00 ; om-owl:samplingTime sens-obs:Instant_2003_3_31_6_55_00. ex:CelsiusValue “7.7”^^xsd:float

sens-obs:Observation_AirTemperature_4UT01_2003_3_31_7_45_00 a weather:TemperatureObservation ; rdfs: label “Air temperature at 7:45:00”, “Not Verified” ; om-owl:observedProperty weather:_AirTemperature ; om-owl:procedure sens-obs:System_4UT01 ; om-owl:result sens-obs:MeasureData_AirTemperature_4UT01_2003_3_31_7_45_00 ; om-owl:samplingTime sens-obs:Instant_2003_3_31_7_45_00 . ex:CelsiusValue “9.4”^^xsd:float

EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Basic Concepts

37

Sub

ject

M

ole

cule

…..Structure ID30= a (1, weather:TemperatureObservation) rdfs:label (2) om-wl:observedProperty (1, weather:_AirTemperature ) om-owl:procedure (1,sens-obs:System_4UT01) om-owl:result (1) om-owl:samplingTime (1) ex:CelsiusValue (1) …..

Structural Dictionary

Suu

bje

ctM

ole

cule

Air Temperature Observations of the

Sensor “System_4UT01”

Page 38: Efficient RDF Interchange (ERI) Format for RDF Data Streams

EFFICIENT RDF INTERCHANGE (ERI) FORMAT – ERI Streams

38Based on: Efficient XML Interchange (EXI) format

Block

Molecule

Molecule

Molecule

Block

Molecule

Molecule

Molecule

Block

Molecule

Molecule

Molecule

……

Multiplex / Demultiplex

Compression/Decompression (per channel)

StreamHeader

Stream Body

METADATA

COMPCHAN.

COMPCHAN.

COMPCHAN.

COMPCHAN.

COMPCHAN.

COMPCHAN.

METADATA

COMPCHAN.

COMPCHAN.

COMPCHAN.

COMPCHAN.

METADATA

COMPCHAN.

COMPCHAN.

COMPCHAN.

COMPCHAN.

COMPCHAN.

COMPCHAN.

ChannelsStructural Channels

Value Channels…

ERI stream

Page 39: Efficient RDF Interchange (ERI) Format for RDF Data Streams

EFFICIENT RDF INTERCHANGE (ERI) FORMAT – ERI Streams

39

ERI follows an encoding procedure similar to that of the Efficient XML Interchange (EXI) format.

Structural channels: They encode the subjects in each block and, for each one, the structural properties of the related triples, using the dynamic dictionary of structures.• Main Terms of molecules: subject of the grouping.• ID-Structures: ID of the structure of each molecule in the block. The ID

points to the entry in the Structural Dictionary.• New Structures: New entries in the Structural Dictionary.

– Value channels: They encode the concrete data values held by each predicate in the block in a compact fashion.• One channel per different predicate in the block.

• Lists explicit values or use IDs pointing to a sliding object dictionary

stru

ctur

eva

riatio

ns

Page 40: Efficient RDF Interchange (ERI) Format for RDF Data Streams

EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Practical Deployment

40

…sens-obs:MeasureData_Air…55_00sens-obs:Instant_2003…55_00sens-obs:MeasureData_Air…45_00sens-obs:Instant_2003…55_00…

ID-Structures …3030…

ID-pred1 weather: TemperatureObservationID-pred2 ID-pred3 weather:_AirTemperatureID-pred4 sensobs: System_4UT01 ID-pred5ID-pred6ID-pred7

[IDs ofStructures]

[Encoded Structures] [Strings]

… om-owl:samplingTime

ex:CelsiusValue…

Structural Channels

ID-pred2

[Object Values][Meta: strings]

…Air temperature at 6:55:00VerifiedAir temperature at 7:45:00Not Verified…

ID-pred5

[Term IDs][Meta: IDs]

New Terms

[Strings]

…101245…

ID-pred6

[Term IDs][Meta: IDs]

12…

Pote

ntial

Com

pres

sion

Differential…

Prefix compressionZlib

Snappy…

Main Terms of Molecules

[Strings]

….sens-obs:Observation_AirTemperature...55_00sens-obs:Observation_AirTemperature...45_00….

Prefix compressionZlib

Snappy…

Prefix compressionZlib

Snappy…

ZlibSnappy

Differential…

Differential…

…10…

[Bits]

New Structure Marker

New Structures New Predicates

ZlibSnappy

[Bits]

New Object MarkerID-pred5

…01…

New Object MarkerID-pred6

[Bits]

11…

1211111

ID-pred7

[Object Values][Meta: xsd:float]

Differential…

…7.79.4….

ValueChannels

Pote

ntial

Com

pres

sion

Page 41: Efficient RDF Interchange (ERI) Format for RDF Data Streams

EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Practical Deployment

41

…sens-obs:MeasureData_Air…55_00sens-obs:Instant_2003…55_00sens-obs:MeasureData_Air…45_00sens-obs:Instant_2003…55_00…

ID-Structures …3030…

ID-pred1 weather: TemperatureObservationID-pred2 ID-pred3 weather:_AirTemperatureID-pred4 sensobs: System_4UT01 ID-pred5ID-pred6ID-pred7

[IDs ofStructures]

[Encoded Structures] [Strings]

… om-owl:samplingTime

ex:CelsiusValue…

Structural Channels

ID-pred2

[Object Values][Meta: strings]

…Air temperature at 6:55:00VerifiedAir temperature at 7:45:00Not Verified…

ID-pred5

[Term IDs][Meta: IDs]

New Terms

[Strings]

…101245…

ID-pred6

[Term IDs][Meta: IDs]

12…

Pote

ntial

Com

pres

sion

Differential…

Prefix compressionZlib

Snappy…

Main Terms of Molecules

[Strings]

….sens-obs:Observation_AirTemperature...55_00sens-obs:Observation_AirTemperature...45_00….

Prefix compressionZlib

Snappy…

Prefix compressionZlib

Snappy…

ZlibSnappy

Differential…

Differential…

…10…

[Bits]

New Structure Marker

New Structures New Predicates

ZlibSnappy

[Bits]

New Object MarkerID-pred5

…01…

New Object MarkerID-pred6

[Bits]

11…

1211111

ID-pred7

[Object Values][Meta: xsd:float]

Differential…

…7.79.4….

ValueChannels

Pote

ntial

Com

pres

sion

Explicit list of values

Page 42: Efficient RDF Interchange (ERI) Format for RDF Data Streams

EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Practical Deployment

42

…sens-obs:MeasureData_Air…55_00sens-obs:Instant_2003…55_00sens-obs:MeasureData_Air…45_00sens-obs:Instant_2003…55_00…

ID-Structures …3030…

ID-pred1 weather: TemperatureObservationID-pred2 ID-pred3 weather:_AirTemperatureID-pred4 sensobs: System_4UT01 ID-pred5ID-pred6ID-pred7

[IDs ofStructures]

[Encoded Structures] [Strings]

… om-owl:samplingTime

ex:CelsiusValue…

Structural Channels

ID-pred2

[Object Values][Meta: strings]

…Air temperature at 6:55:00VerifiedAir temperature at 7:45:00Not Verified…

ID-pred5

[Term IDs][Meta: IDs]

New Terms

[Strings]

…101245…

ID-pred6

[Term IDs][Meta: IDs]

12…

Pote

ntial

Com

pres

sion

Differential…

Prefix compressionZlib

Snappy…

Main Terms of Molecules

[Strings]

….sens-obs:Observation_AirTemperature...55_00sens-obs:Observation_AirTemperature...45_00….

Prefix compressionZlib

Snappy…

Prefix compressionZlib

Snappy…

ZlibSnappy

Differential…

Differential…

…10…

[Bits]

New Structure Marker

New Structures New Predicates

ZlibSnappy

[Bits]

New Object MarkerID-pred5

…01…

New Object MarkerID-pred6

[Bits]

11…

1211111

ID-pred7

[Object Values][Meta: xsd:float]

Differential…

…7.79.4….

ValueChannels

Pote

ntial

Com

pres

sion

Explicit list of values

Page 43: Efficient RDF Interchange (ERI) Format for RDF Data Streams

EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Practical Deployment

43

…sens-obs:MeasureData_Air…55_00sens-obs:Instant_2003…55_00sens-obs:MeasureData_Air…45_00sens-obs:Instant_2003…55_00…

ID-Structures …3030…

ID-pred1 weather: TemperatureObservationID-pred2 ID-pred3 weather:_AirTemperatureID-pred4 sensobs: System_4UT01 ID-pred5ID-pred6ID-pred7

[IDs ofStructures]

[Encoded Structures] [Strings]

… om-owl:samplingTime

ex:CelsiusValue…

Structural Channels

ID-pred2

[Object Values][Meta: strings]

…Air temperature at 6:55:00VerifiedAir temperature at 7:45:00Not Verified…

ID-pred5

[Term IDs][Meta: IDs]

New Terms

[Strings]

…101245…

ID-pred6

[Term IDs][Meta: IDs]

12…

Pote

ntial

Com

pres

sion

Differential…

Prefix compressionZlib

Snappy…

Main Terms of Molecules

[Strings]

….sens-obs:Observation_AirTemperature...55_00sens-obs:Observation_AirTemperature...45_00….

Prefix compressionZlib

Snappy…

Prefix compressionZlib

Snappy…

ZlibSnappy

Differential…

Differential…

…10…

[Bits]

New Structure Marker

New Structures New Predicates

ZlibSnappy

[Bits]

New Object MarkerID-pred5

…01…

New Object MarkerID-pred6

[Bits]

11…

1211111

ID-pred7

[Object Values][Meta: xsd:float]

Differential…

…7.79.4….

ValueChannels

Pote

ntial

Com

pres

sion

Explicit list of values

Page 44: Efficient RDF Interchange (ERI) Format for RDF Data Streams

EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Practical Deployment

44

…sens-obs:MeasureData_Air…55_00sens-obs:Instant_2003…55_00sens-obs:MeasureData_Air…45_00sens-obs:Instant_2003…55_00…

ID-Structures …3030…

ID-pred1 weather: TemperatureObservationID-pred2 ID-pred3 weather:_AirTemperatureID-pred4 sensobs: System_4UT01 ID-pred5ID-pred6ID-pred7

[IDs ofStructures]

[Encoded Structures] [Strings]

… om-owl:samplingTime

ex:CelsiusValue…

Structural Channels

ID-pred2

[Object Values][Meta: strings]

…Air temperature at 6:55:00VerifiedAir temperature at 7:45:00Not Verified…

ID-pred5

[Term IDs][Meta: IDs]

New Terms

[Strings]

…101245…

ID-pred6

[Term IDs][Meta: IDs]

12…

Pote

ntial

Com

pres

sion

Differential…

Prefix compressionZlib

Snappy…

Main Terms of Molecules

[Strings]

….sens-obs:Observation_AirTemperature...55_00sens-obs:Observation_AirTemperature...45_00….

Prefix compressionZlib

Snappy…

Prefix compressionZlib

Snappy…

ZlibSnappy

Differential…

Differential…

…10…

[Bits]

New Structure Marker

New Structures New Predicates

ZlibSnappy

[Bits]

New Object MarkerID-pred5

…01…

New Object MarkerID-pred6

[Bits]

11…

1211111

ID-pred7

[Object Values][Meta: xsd:float]

Differential…

…7.79.4….

ValueChannels

Pote

ntial

Com

pres

sion

Explicit list of values

Page 45: Efficient RDF Interchange (ERI) Format for RDF Data Streams

EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Practical Deployment

45

…sens-obs:MeasureData_Air…55_00sens-obs:Instant_2003…55_00sens-obs:MeasureData_Air…45_00sens-obs:Instant_2003…55_00…

ID-Structures …3030…

ID-pred1 weather: TemperatureObservationID-pred2 ID-pred3 weather:_AirTemperatureID-pred4 sensobs: System_4UT01 ID-pred5ID-pred6ID-pred7

[IDs ofStructures]

[Encoded Structures] [Strings]

… om-owl:samplingTime

ex:CelsiusValue…

Structural Channels

ID-pred2

[Object Values][Meta: strings]

…Air temperature at 6:55:00VerifiedAir temperature at 7:45:00Not Verified…

ID-pred5

[Term IDs][Meta: IDs]

New Terms

[Strings]

…101245…

ID-pred6

[Term IDs][Meta: IDs]

12…

Pote

ntial

Com

pres

sion

Differential…

Prefix compressionZlib

Snappy…

Main Terms of Molecules

[Strings]

….sens-obs:Observation_AirTemperature...55_00sens-obs:Observation_AirTemperature...45_00….

Prefix compressionZlib

Snappy…

Prefix compressionZlib

Snappy…

ZlibSnappy

Differential…

Differential…

…10…

[Bits]

New Structure Marker

New Structures New Predicates

ZlibSnappy

[Bits]

New Object MarkerID-pred5

…01…

New Object MarkerID-pred6

[Bits]

11…

1211111

ID-pred7

[Object Values][Meta: xsd:float]

Differential…

…7.79.4….

ValueChannels

Pote

ntial

Com

pres

sion

Explicit list of values

IDs pointing to a sliding object

dictionary

Page 46: Efficient RDF Interchange (ERI) Format for RDF Data Streams

EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Practical Deployment

46

…sens-obs:MeasureData_Air…55_00sens-obs:Instant_2003…55_00sens-obs:MeasureData_Air…45_00sens-obs:Instant_2003…55_00…

ID-Structures …3030…

ID-pred1 weather: TemperatureObservationID-pred2 ID-pred3 weather:_AirTemperatureID-pred4 sensobs: System_4UT01 ID-pred5ID-pred6ID-pred7

[IDs ofStructures]

[Encoded Structures] [Strings]

… om-owl:samplingTime

ex:CelsiusValue…

Structural Channels

ID-pred2

[Object Values][Meta: strings]

…Air temperature at 6:55:00VerifiedAir temperature at 7:45:00Not Verified…

ID-pred5

[Term IDs][Meta: IDs]

New Terms

[Strings]

…101245…

ID-pred6

[Term IDs][Meta: IDs]

12…

Pote

ntial

Com

pres

sion

Differential…

Prefix compressionZlib

Snappy…

Main Terms of Molecules

[Strings]

….sens-obs:Observation_AirTemperature...55_00sens-obs:Observation_AirTemperature...45_00….

Prefix compressionZlib

Snappy…

Prefix compressionZlib

Snappy…

ZlibSnappy

Differential…

Differential…

…10…

[Bits]

New Structure Marker

New Structures New Predicates

ZlibSnappy

[Bits]

New Object MarkerID-pred5

…01…

New Object MarkerID-pred6

[Bits]

11…

1211111

ID-pred7

[Object Values][Meta: xsd:float]

Differential…

…7.79.4….

ValueChannels

Pote

ntial

Com

pres

sion

Explicit list of values

IDs pointing to a sliding object

dictionary

Page 47: Efficient RDF Interchange (ERI) Format for RDF Data Streams

EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Practical Deployment

47

…sens-obs:MeasureData_Air…55_00sens-obs:Instant_2003…55_00sens-obs:MeasureData_Air…45_00sens-obs:Instant_2003…55_00…

ID-Structures …3030…

ID-pred1 weather: TemperatureObservationID-pred2 ID-pred3 weather:_AirTemperatureID-pred4 sensobs: System_4UT01 ID-pred5ID-pred6ID-pred7

[IDs ofStructures]

[Encoded Structures] [Strings]

… om-owl:samplingTime

ex:CelsiusValue…

Structural Channels

ID-pred2

[Object Values][Meta: strings]

…Air temperature at 6:55:00VerifiedAir temperature at 7:45:00Not Verified…

ID-pred5

[Term IDs][Meta: IDs]

New Terms

[Strings]

…101245…

ID-pred6

[Term IDs][Meta: IDs]

12…

Pote

ntial

Com

pres

sion

Differential…

Prefix compressionZlib

Snappy…

Main Terms of Molecules

[Strings]

….sens-obs:Observation_AirTemperature...55_00sens-obs:Observation_AirTemperature...45_00….

Prefix compressionZlib

Snappy…

Prefix compressionZlib

Snappy…

ZlibSnappy

Differential…

Differential…

…10…

[Bits]

New Structure Marker

New Structures New Predicates

ZlibSnappy

[Bits]

New Object MarkerID-pred5

…01…

New Object MarkerID-pred6

[Bits]

11…

1211111

ID-pred7

[Object Values][Meta: xsd:float]

Differential…

…7.79.4….

ValueChannels

Pote

ntial

Com

pres

sion

Explicit list of values

IDs pointing to a sliding object

dictionary

Extraction of types

Page 48: Efficient RDF Interchange (ERI) Format for RDF Data Streams

EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Practical Deployment

48

…sens-obs:MeasureData_Air…55_00sens-obs:Instant_2003…55_00sens-obs:MeasureData_Air…45_00sens-obs:Instant_2003…55_00…

ID-Structures …3030…

ID-pred1 weather: TemperatureObservationID-pred2 ID-pred3 weather:_AirTemperatureID-pred4 sensobs: System_4UT01 ID-pred5ID-pred6ID-pred7

[IDs ofStructures]

[Encoded Structures] [Strings]

… om-owl:samplingTime

ex:CelsiusValue…

Structural Channels

ID-pred2

[Object Values][Meta: strings]

…Air temperature at 6:55:00VerifiedAir temperature at 7:45:00Not Verified…

ID-pred5

[Term IDs][Meta: IDs]

New Terms

[Strings]

…101245…

ID-pred6

[Term IDs][Meta: IDs]

12…

Pote

ntial

Com

pres

sion

Differential…

Prefix compressionZlib

Snappy…

Main Terms of Molecules

[Strings]

….sens-obs:Observation_AirTemperature...55_00sens-obs:Observation_AirTemperature...45_00….

Prefix compressionZlib

Snappy…

Prefix compressionZlib

Snappy…

ZlibSnappy

Differential…

Differential…

…10…

[Bits]

New Structure Marker

New Structures New Predicates

ZlibSnappy

[Bits]

New Object MarkerID-pred5

…01…

New Object MarkerID-pred6

[Bits]

11…

1211111

ID-pred7

[Object Values][Meta: xsd:float]

Differential…

…7.79.4….

ValueChannels

Pote

ntial

Com

pres

sion

Explicit list of values

IDs pointing to a sliding object

dictionary

Extraction of types

Page 49: Efficient RDF Interchange (ERI) Format for RDF Data Streams

EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Practical Deployment

49

…sens-obs:MeasureData_Air…55_00sens-obs:Instant_2003…55_00sens-obs:MeasureData_Air…45_00sens-obs:Instant_2003…55_00…

ID-Structures …3030…

ID-pred1 weather: TemperatureObservationID-pred2 ID-pred3 weather:_AirTemperatureID-pred4 sensobs: System_4UT01 ID-pred5ID-pred6ID-pred7

[IDs ofStructures]

[Encoded Structures] [Strings]

… om-owl:samplingTime

ex:CelsiusValue…

Structural Channels

ID-pred2

[Object Values][Meta: strings]

…Air temperature at 6:55:00VerifiedAir temperature at 7:45:00Not Verified…

ID-pred5

[Term IDs][Meta: IDs]

New Terms

[Strings]

…101245…

ID-pred6

[Term IDs][Meta: IDs]

12…

Pote

ntial

Com

pres

sion

Differential…

Prefix compressionZlib

Snappy…

Main Terms of Molecules

[Strings]

….sens-obs:Observation_AirTemperature...55_00sens-obs:Observation_AirTemperature...45_00….

Prefix compressionZlib

Snappy…

Prefix compressionZlib

Snappy…

ZlibSnappy

Differential…

Differential…

…10…

[Bits]

New Structure Marker

New Structures New Predicates

ZlibSnappy

[Bits]

New Object MarkerID-pred5

…01…

New Object MarkerID-pred6

[Bits]

11…

1211111

ID-pred7

[Object Values][Meta: xsd:float]

Differential…

…7.79.4….

ValueChannels

Pote

ntial

Com

pres

sion

Explicit list of values

IDs pointing to a sliding object

dictionary

Extraction of types

Page 50: Efficient RDF Interchange (ERI) Format for RDF Data Streams

Outline

Index

1. Introduction & Motivation2. Background3. Efficient RDF Interchange (ERI) Format

i. Basic Conceptsii. ERI Streamsiii. Practical Deployment

4. Evaluation5. Conclusions and Next steps

50

Page 51: Efficient RDF Interchange (ERI) Format for RDF Data Streams

EVALUATION - COMPRESSION

51

Page 52: Efficient RDF Interchange (ERI) Format for RDF Data Streams

EVALUATION - COMPRESSION

52

ERI excels in space for streaming and statistical dataset

Page 53: Efficient RDF Interchange (ERI) Format for RDF Data Streams

EVALUATION - COMPRESSION

53

ERI excels in space for streaming and statistical dataset

RDSZ remains comparable to our approach

Page 54: Efficient RDF Interchange (ERI) Format for RDF Data Streams

EVALUATION - COMPRESSION

54

ERI excels in space for streaming and statistical dataset

The object dictionary can overload the representation, although it always obtains comparable compression ratios.

RDSZ remains comparable to our approach

Page 55: Efficient RDF Interchange (ERI) Format for RDF Data Streams

EVALUATION - COMPRESSION

55

Page 56: Efficient RDF Interchange (ERI) Format for RDF Data Streams

EVALUATION - COMPRESSION

56

A smaller buffer in ERI-1k slightly affects the efficiency

Page 57: Efficient RDF Interchange (ERI) Format for RDF Data Streams

EVALUATION - PARSING

57

Page 58: Efficient RDF Interchange (ERI) Format for RDF Data Streams

EVALUATION - PARSING

58

ERI always outperforms the RDSZ compression time (3 and 3.8 times on average for ERI-4k and ERI-4k-Nodict, respectively)

Page 59: Efficient RDF Interchange (ERI) Format for RDF Data Streams

EVALUATION - PARSING

59

ERI always outperforms the RDSZ compression time (3 and 3.8 times on average for ERI-4k and ERI-4k-Nodict, respectively)

ERI decompression is commonly slower (1.4 times on average in both ERI configurations), typically due to decompressing several channels.

Page 60: Efficient RDF Interchange (ERI) Format for RDF Data Streams

EVALUATION - PARSING

60

ERI always outperforms the RDSZ compression time (3 and 3.8 times on average for ERI-4k and ERI-4k-Nodict, respectively)

ERI decompression is commonly slower (1.4 times on average in both ERI configurations), typically due to decompressing several channels.

Channels could be grouped (as in EXI)

Page 61: Efficient RDF Interchange (ERI) Format for RDF Data Streams

EVALUATION – CONSUMING SCENARIO

61

In parsing: transmission + decompression

Page 62: Efficient RDF Interchange (ERI) Format for RDF Data Streams

EVALUATION – CONSUMING SCENARIO

62

ERI-4k and ERI-4k-Nodict outperform the baseline in transmission + decompressionexcept for those datasets with less regularities in the structure or the data values,

In parsing: transmission + decompression

Page 63: Efficient RDF Interchange (ERI) Format for RDF Data Streams

EVALUATION – CONSUMING SCENARIO

63

In a scenario in which we include the compression time

Page 64: Efficient RDF Interchange (ERI) Format for RDF Data Streams

EVALUATION – CONSUMING SCENARIO

64

ERI-4k suffers an expected overhead as we are always including the timeto process the information

In a scenario in which we include the compression time

Page 65: Efficient RDF Interchange (ERI) Format for RDF Data Streams

EVALUATION – CONSUMING SCENARIO

65

ERI-4k suffers an expected overhead as we are always including the timeto process the information

In a scenario in which we include the compression time

The time in which the client receives all data in ERI is comparable to the baseline

Page 66: Efficient RDF Interchange (ERI) Format for RDF Data Streams

Results

66

• Compressed, efficient RDF interchange (ERI) format• exploit the RDF data stream regularity of their structure and

data values• Flexible and extensible ERI configurations• Minimize transmission costs in RDF stream processing

• State-of-the-art compression • Remains efficient in performance

• Time overheads are relatively low and can be assumed in many scenarios.

Page 67: Efficient RDF Interchange (ERI) Format for RDF Data Streams

67

Next steps

• Integration within RDF streaming Engines • e.g. morph-streams, CQELS Cloud• 3 purposes:

• scaling to higher input data rates• minimizing the data exchange among processing nodes• serving a small set of operators on the compressed data

• Parallel compression/decompression• preliminary proposal on Storm

• Align the proposal with the results of W3C RSP group regarding streaming modeling and serialization

Page 68: Efficient RDF Interchange (ERI) Format for RDF Data Streams

Efficient RDF Interchange (ERI) Format for RDF Data Streams

Javier D. Fernández, Alejandro Llaves, Oscar CorchoOntology Engineering Group (OEG), Universidad Politécnica de Madrid, Spain

purl.org/net/ro-eri-ISWC14

Electronic edition:

Research object: