How much Semantic Data on Small Devices?

20
How much semantic data on small devices? Mathieu d’Aquin, AndriyNikolov and Enrico Motta Knowledge Media Institute, The Open Univeristy, UK [email protected] @mdaquin

description

Short paper presentation at the EKAW 2010 conference on benchmarking RDF triple stores on small devices.

Transcript of How much Semantic Data on Small Devices?

Page 1: How much Semantic Data on Small Devices?

How much semantic data on

small devices?

Mathieu d’Aquin, AndriyNikolov and Enrico MottaKnowledge Media Institute, The Open Univeristy, UK

[email protected]

@mdaquin

Page 2: How much Semantic Data on Small Devices?

Semantic Data on Small Devices?

Page 3: How much Semantic Data on Small Devices?

Benchmarking Semantic Data Tools

Large Scale Benchmarks

LUBM(1,0)103,397 triples

Page 4: How much Semantic Data on Small Devices?

Extracting sets of small-scale

ontologies

Clusters of ontologies having similar characteristics, except for size

Page 5: How much Semantic Data on Small Devices?

Extracting sets of small-scale

Ontologies

• Characteristics of ontologies

– Size (tiples): varies from very small scale to

medium scale

– Ratio class/prop: allowing 50% variance

– Ratio class/inst.: allowing 50% variance

– DL expressivity: Complexity of the

language

• 99 automatically created clusters

• Manual selection of 10

Page 6: How much Semantic Data on Small Devices?

Results

Size (triples) Prop/class Ind/class DL

9-2742 0.65-1.0 1.0-2.0 ALO

27-3688 0.21-0.48 0.07-0.14 ALH

2-8502 N/A N/A -

17-3696 0.66-2.0 4.5-20.5 -

3208-658808 N/A N/A EL

1514-153298 N/A N/A ELR+

8-3657 N/A N/A -

7-4959 1.41-4.0 N/A AL

1-2759 N/A N/A -

43-5132 1.0-2.0 13.0-22.09 -

Page 7: How much Semantic Data on Small Devices?

Queries

• Using real life ontologies need domain independent Queries

• A set of 8 generic queries of varying complexity, and which results might depend on inference

Select all labels

Select all comments

Select all labels and comments

Select all RDFS classes

Select all classes (RDFS/OWL/DAML)

Select all instances of all classes

Select all properties applied to instances of all classes

Select all properties by their domain

Page 8: How much Semantic Data on Small Devices?

Running the benchmarks – Triple

Stores

Jena with TDB persistent storage

R As above + RDFS reasoning

R

Sesame with persistent storage

As above + RDFS reasoning

Mulgara with default configuration

Page 9: How much Semantic Data on Small Devices?

Running the benchmarks – Device

Asus EEE PC 700 (2G)

Page 10: How much Semantic Data on Small Devices?

Running the benchmarks - Measures

• Loading time: for each ontologies in an

empty, re-initialized store.

• Disk Space: of the persistent store right

after loading.

• Memory consumption: of the triple store

process right after loading the ontology.

• Query time: for each ontology, averaged

over the 8 queries.

Page 11: How much Semantic Data on Small Devices?

Results – Loading time

Page 12: How much Semantic Data on Small Devices?

Results – Loading time

R

R

=

Page 13: How much Semantic Data on Small Devices?

Results – Disk Space

Page 14: How much Semantic Data on Small Devices?

Results – Disk Space

RR=< <

Page 15: How much Semantic Data on Small Devices?

Results – Memory consumption

Page 16: How much Semantic Data on Small Devices?

Results – Memory

consumptions

R

R

=

Page 17: How much Semantic Data on Small Devices?

Result – Query time

Page 18: How much Semantic Data on Small Devices?

Result – Query time

R=

R

<

Page 19: How much Semantic Data on Small Devices?

Conclusion – on tests

• Sesame performs best in almost all

aspects, even when including reasoning

• Reasoning has big impact on Jena TDB at

query time

• Mulgara is clearly not adequate in a small-

scale scenario

Page 20: How much Semantic Data on Small Devices?

Conclusion – on small-scale benchmarking

• Validates our assumption that small-scale benchmarks give different results than large-scale benchmarks

• Points out the need for more work to tackle the small-scale scenarios

• Results are not always clear cut in every aspects: benchmarks as support to decide which tool to use, depending on the application constraints