OWL Datatypes: Design and Implementation Boris Motik and Ian Horrocks University of Oxford.

22
OWL Datatypes: Design and Implementation Boris Motik and Ian Horrocks University of Oxford

Transcript of OWL Datatypes: Design and Implementation Boris Motik and Ian Horrocks University of Oxford.

Page 1: OWL Datatypes: Design and Implementation Boris Motik and Ian Horrocks University of Oxford.

OWL Datatypes:Design and Implementation

Boris Motik and Ian HorrocksUniversity of Oxford

Page 2: OWL Datatypes: Design and Implementation Boris Motik and Ian Horrocks University of Oxford.

2/22

Contents

• Introduction

• The Datatype System of OWL 2

• The Datatypes of OWL 2

• A Modular Datatype Checker

• Conclusion

Page 3: OWL Datatypes: Design and Implementation Boris Motik and Ian Horrocks University of Oxford.

3/22

Problems with Datatypes in OWL 1

• Datatypes of OWL 1 are based on XML

Schema (XSD)

• Problems with OWL 1 datatypes: too few normative ones

no user-defined datatypes (e.g., intervals)

reasoning with some XSD datatypes is difficult

some XSD datatypes have an inappropriate semantics

there are datatype-less constants

certain semantic aspects are unclear

reasoning algorithms are unclear

Page 4: OWL Datatypes: Design and Implementation Boris Motik and Ian Horrocks University of Oxford.

4/22

Motivation

• OWL 2: a new version of OWL considerably improves the datatype system of OWL

• Our results ensure that… …the datatype system of OWL 2 is extensible

…certain language extensions are correctly defined

…OWL 2 supports datatypes that are practically feasible

…we know how to implement the datatypes of OWL 2

Make datatypes in OWL 2 better

Provide guidance for implementors

Page 5: OWL Datatypes: Design and Implementation Boris Motik and Ian Horrocks University of Oxford.

5/22

Contents

• Introduction

• The Datatype System of OWL 2

• The Datatypes of OWL 2

• A Modular Datatype Checker

• Conclusion

Page 6: OWL Datatypes: Design and Implementation Boris Motik and Ian Horrocks University of Oxford.

6/22

Datatype Map

• Each datatype d is described by: a URI – gives the name of the datatype a set of constants NC(d)

a set of facets pairs NF(d)

a value space (d)D

a data value (c)D 2 (d)D for each constant c a facet value (f)D µ (d)D for each facet f

• Example: real facets: <x, >x, ·x, ¸x, int

• Example: str facets: h minLength n i, h maxLength n i, h length n i,

h pattern “regExp” i

Page 7: OWL Datatypes: Design and Implementation Boris Motik and Ian Horrocks University of Oxford.

7/22

Data Ranges

• Facet expression: Boolean formula over facets e.g., ¸5 Æ ·10

• Datatype restriction: d[] d is a datatype and is a facet expression for d

e.g., real[ int Æ ¸5 Æ ·10 ]

OWL 2 Syntax: DatatypeRestriction( xsd:integer xsd:minInclusive

“5”^^xsd:integer xsd:maxInclusive “10”^^xsd:integer )

• Data range: >D, d[], { v1, …, vn }, dr will be extended in OWL 2 to all Boolean connectives

Page 8: OWL Datatypes: Design and Implementation Boris Motik and Ian Horrocks University of Oxford.

8/22

Using Data Ranges in Restrictions

• New datatype constructs: qualified number restrictions

disjoint data properties

• Semantics is defined w.r.t. a datatype domain MD

Page 9: OWL Datatypes: Design and Implementation Boris Motik and Ian Horrocks University of Oxford.

9/22

Openness of the Datatype Domain

• MD is usually fixed in DL reasoning datatype groups: MD is exactly the union of all value spaces

• Problem: adding new datatypes can change the meaning of certain axioms

• Example: > v 8 U.<5 t 9 U.real if real is the only datatype, then this axiom is a tautology

if we have both real and str, it is not a tautology

We do not fix MD in OWL 2 an ontology is satisfiable iff MD exists that at least contains the

value spaces of all datatypes and for which all axioms are satisfied

• Proposition: consequences of OWL 2 ontologies are independent of the supported set of datatypes

Page 10: OWL Datatypes: Design and Implementation Boris Motik and Ian Horrocks University of Oxford.

10/22

Naming Data Ranges

• Teens ´ real[ int Æ >12 Æ <20 ] semantics: (Teens)D = (real[ int Æ >12 Æ <20 ])D

use Teens as a shortcut e.g., Teenager ´ 9 hasAge.Teens

• Problem: we can write axioms about datatypes A ´ real and A ´ >D

fixes MD to (real)D

prevents us from extending the set of datatypes

Make such axioms acyclic each data range name can be defined only once and its

definition cannot refer to itself

allows for simple unfolding of data range names

Page 11: OWL Datatypes: Design and Implementation Boris Motik and Ian Horrocks University of Oxford.

11/22

Datatype Reasoning

• Datatype checker decides satisfiability of conjunctions over assertions dr(t) and t1 ¼ t2

t(i) is a variable or a constant

example: { 5 }(x1) Æ int[ >4 Æ <6 ](x2) Æ x1 ¼ x2

• Datatype checker can be integrated with a (hyper)tableau algorithm as usual

• Proposition: datatype checking is NP-hard uses data property disjointness

seems like an innocuous feature!

even small additions to the language add complexity

Page 12: OWL Datatypes: Design and Implementation Boris Motik and Ian Horrocks University of Oxford.

12/22

Contents

• Introduction

• The Datatype System of OWL 2

• The Datatypes of OWL 2

• A Modular Datatype Checker

• Conclusion

Page 13: OWL Datatypes: Design and Implementation Boris Motik and Ian Horrocks University of Oxford.

13/22

Numeric Datatypes

• The following ontology is unsatisfiable: > v 8 hasWeight.xsd:double hasWeight(Paul, “76”^^xsd:integer)

in XSD, the integer 76 is not contained in xsd:double

no notion of typecasts in OWL

• XML Schema does not have real numbers

OWL 2 redefines XSD numeric datatypes owl:realPlus = owl:real [ { -0, +inf, -inf, NaN } owl:real is the set of all real numbers all XSD numeric datatypes are subsets of owl:real facets:

minExclusive, maxExclusive, minInclusive, maxInclusive

Page 14: OWL Datatypes: Design and Implementation Boris Motik and Ian Horrocks University of Oxford.

14/22

String Datatypes

• Plain RDF literals with a language tag do not belong to any XSD datatype “datatype”@en vs. “Datentyp”@de

OWL 2 uses a new rdf:text datatype value space contains pairs h string, languageTag i will be used in RIF as well

xsd:string was retrofitted to rdf:text value space contains pairs h string, “” i

The set of characters is assume to be infinite E.g., ¸ n U.(str[ length 1])(a) is satisfiable iff n · m, where m is the

number characters

m will change in future, which could change the meaning of this axiom

Page 15: OWL Datatypes: Design and Implementation Boris Motik and Ian Horrocks University of Oxford.

15/22

Other Datatypes

• Date/time: many XSD date/time datatypes are difficult to reason with

e.g., xsd:gMonthDay represents a recurring point in time but recurrences are irregular due to leap seconds and years

XSD supports dates without time zones

OWL 2 supports only xsd:dateTime with required time zone facets: minExclusive, maxExclusive, minInclusive, maxInclusive

• xsd:boolean• xsd:hexBinary and xsd:base64Binary• xsd:anyURI

disjoint with xsd:string

Page 16: OWL Datatypes: Design and Implementation Boris Motik and Ian Horrocks University of Oxford.

16/22

Contents

• Introduction

• The Datatype System of OWL 2

• The Datatypes of OWL 2

• A Modular Datatype Checker

• Conclusion

Page 17: OWL Datatypes: Design and Implementation Boris Motik and Ian Horrocks University of Oxford.

17/22

Modular Datatype Checking

• We assume that all datatypes are disjoint xsd:integer is understood as a facet of owl:real

provides us with a natural modularization boundary

• Each datatype d needs a datatype handler: mincd(d[], n)

true iff (d[])D contains at least n elements

enud(d[]) defined only if (d[])D is finite enumerates the extension of d[]

ind(c, d[]) true iff cD 2 (d[])D

eqd(c1, c2)

true iff c1D = c2

D

Page 18: OWL Datatypes: Design and Implementation Boris Motik and Ian Horrocks University of Oxford.

18/22

The Algorithm

• Input: a conjunction of assertions• Output: true iff the conjunction is satisfiable

1. Normalize such that each variable x in it occurs in exactly one assertion d[](x)

2. Simplify delete from assertions containing certain variables in all remaining assertions of the form d[](x), the data

range d[] is finite

3. Replace d[](x) with D(x) for D = enud(d[])

4. Guess values for all variables

5. Check whether the guess satisfies

Can bereducedto SAT

Page 19: OWL Datatypes: Design and Implementation Boris Motik and Ian Horrocks University of Oxford.

19/22

• If contains a variable x such that x occurs in in exactly one assertion d[](x), x occurs in in m assertions of the form x ¼ x’, x occurs in in n assertions of the form x ¼ c, and mincd(d[], m+n+1) = true

then delete in all assertions containing x

If | (d[])D | ¸ m+n+1, then we can satisfy x for any choice of values for x’ the constraints on x are irrelevant for the satisfiability of

• Key to practical reasoning: data ranges in practice are likely to be large (even infinite)

The Simplification Step

Page 20: OWL Datatypes: Design and Implementation Boris Motik and Ian Horrocks University of Oxford.

20/22

Handling Numbers and Strings

• Numbers: represent facets as intervals of the form dt(low, high)

facet expressions can be normalized using a suitable interval algebra

• Strings: represent facets as regular languages

facet expressions can be normalized using standard results for Boolean operations with regular languages

caveat: the underlying alphabet is infinite

need to adapt Boolean operations on regular languages

• In both cases, datatype handlers are easily implemented for normalized expressions

Page 21: OWL Datatypes: Design and Implementation Boris Motik and Ian Horrocks University of Oxford.

21/22

Contents

• Introduction

• The Datatype System of OWL 2

• The Datatypes of OWL 2

• A Modular Datatype Checker

• Conclusion

Page 22: OWL Datatypes: Design and Implementation Boris Motik and Ian Horrocks University of Oxford.

22/22

Conclusion

• The algorithm has been implemented in the HermiT reasoner a new OWL 2 reasoner based on hypertableau http://www.hermit-reasoner.com/

• No formal evaluation yet, but…

• Supporting datatypes did not noticeably change classification times data ranges used in practice are often “large enough”