An Algebraic Approach for Specifying Compound Terms in Faceted Taxonomies Yannis Tzitzikas 1...

25
An Algebraic Approach for Specifying Compound Terms in Faceted Taxonomies Yannis Tzitzikas 1 Anastasia Analyti 2 Nicolas Spyratos 3 Panos Constantopoulos 2,4 1 Instituto di Scienza e Technologie dell’Informazione CNR- ISTI,Italy 2 Institute of Computer Science, ICS-FORTH, Greece 3 Laboratoire de Recherche en Informatique, Universite de Paris-Sud, France 4 Department of Computer Science, University of Crete, Greece

Transcript of An Algebraic Approach for Specifying Compound Terms in Faceted Taxonomies Yannis Tzitzikas 1...

An Algebraic Approach for Specifying Compound Terms in Faceted Taxonomies

Yannis Tzitzikas 1

Anastasia Analyti 2

Nicolas Spyratos 3

Panos Constantopoulos 2,4

1 Instituto di Scienza e Technologie dell’Informazione CNR-ISTI,Italy

2 Institute of Computer Science, ICS-FORTH, Greece

3 Laboratoire de Recherche en Informatique, Universite de Paris-Sud, France

4 Department of Computer Science, University of Crete, Greece

June 2003 Yannis Tzitzikas et al., EJC'2003 2

Outline of the presentation

• Introduction - Motivation

• Faceted Classification and Faceted Taxonomies

– Advantages and Problems

• Compound Terms and Compound Taxonomies

• The Algebra

– Operations

– Examples

– Algorithms

– Deriving Navigational Trees

– Prototype implementation

• Concluding Remarks

June 2003 Yannis Tzitzikas et al., EJC'2003 3

Introduction

• Existing ways to locate information in the Web– searching (using search engines like Google)

– browsing (using catalogues like Yahoo!, ODP)

• Currently, the catalogues are also exploited by the search engines:– for improving the measuring of relevance

– for giving to the user a set of related pages to each page of the answer

– for limiting the scope of the search

• Web Catalogues (or indices using controlled structured vocabularies):[-]: index only a subset of the pages that are indexed by search engines

[+]: ensure indexing consistency

[+]: enable intelligent reasoning

[+]: enable browsing

June 2003 Yannis Tzitzikas et al., EJC'2003 4

Drawbacks of the taxonomies that are used by Web Catalogues

• Hard to understand

• Laborious browsing

• Laborious object indexing

• Hard to update/revise

• Large storage requirements

(1) Big size (e.g. currently Open Directory has 460.000 terms)

(2) Inconsistent and incomplete terminology and structuring

USER

DESIGNER

June 2003 Yannis Tzitzikas et al., EJC'2003 5

Faceted Classification and Faceted Taxonomies

Faceted classification was developed, prior to the existence of computers,by S. R. Ranganathan (1892-1972), a Hindu mathematician working as a librarian.

Key point: Faceted taxonomies do not require an a priori division of concepts into subconcepts (only relationships between elemental concepts are stored)

* A faceted taxonomy consists of a set of facets

* Each facet is a group of elemental concepts

* Each object is indexed by synthesizing elemental concepts

Advantages of faceted taxonomies:

• they are easier to build and understand

• they require less storage space requirements

• they are more scalable

June 2003 Yannis Tzitzikas et al., EJC'2003 6

Faceted Taxonomies

Sports

SeaSports WinterSports

Location

Islands Mainland

Crete Pilio Olympus

1F 2F

.facetTF

FFFfaceted

ii

k

a called is ),( each which

in s taxonomieof },...,{set a by defined is taxonomy A

i

1

e) transitivand (reflexive over relation a :

or names ofset a i.e. , a:

Tnsubsumptio

termsyterminologT

),(pair a is An T taxonomy

June 2003 Yannis Tzitzikas et al., EJC'2003 7

Example of using one taxonomy

1 billion pages

blocks of 10 pages

100 million indexing terms

Completeand balanceddecimal tree

Total: 111,111,111 terms

June 2003 Yannis Tzitzikas et al., EJC'2003 8

Example of using a faceted taxonomy consisting of 4 facets

1 billion pages

blocks of 10 pages

100 million indexing terms

Total: 444 terms

100 terms 100 terms 100 terms 100 termsx x x 400 terms

June 2003 Yannis Tzitzikas et al., EJC'2003 9

Example of using a faceted taxonomy consisting of 8 facets

1 billion pages

blocks of 10 pages

100 million indexing terms

Total: 88 terms!… …

10 terms 10 termsx … x 80 terms…

June 2003 Yannis Tzitzikas et al., EJC'2003 10

Sports

SeaSports WinterSports

Location

Islands Mainland

Crete Pilio Olympus

1F 2F

The Problem of Faceted Taxonomies

Consequences:

• laborious/erroneous object indexing

• difficulties in browsing

Invalid compound terms may appear during

object indexing or browsing/retrieval

A compound term is invalid if it cannot be applied to any object of the domain

June 2003 Yannis Tzitzikas et al., EJC'2003 11

Valid and Invalid Compound Terms

Valid Invalid

Terms Compound

Sports

SeaSports WinterSports

Location

Islands Mainland

Crete Pilio Olympus

F

SeaSports.Olympus

WinterSports.Islands

WinterSports.Crete

Invalid Compound TermsSports.Location,

Sports.Islands

Sports.Crete

Sports.Mainland

Sports.Pilio

Sports.Olymous

SeaSports.Location,

SeaSports.Islands

SeaSports.Crete

SeaSports.Mainland

SeaSports.Pilio

WinterSports.Location,

WinterSports.Mainland

WinterSports.Pilio

WinterSports.Olympus

Valid Compound Terms

Example:

June 2003 Yannis Tzitzikas et al., EJC'2003 12

The Idea

Define an algebra with operators that allow specifying the set of valid compound terms without having to enumerate all of the valid compound terms.

Operations:

unaryCombines terms from one facet plus negative modifiersself-minus-product

unaryCombines terms from one facet plus positive modifiersself-plus-product

unaryCombines terms from one facetself-product

n-aryCombines terms from different facets plus negative modifiersminus-product

n-aryCombines terms from different facets plus positive modifiersplus-product

n-aryCombines terms from different facetsproduct

P

*P

*

*NΘ

Initial Operands: Facet terminologies: }ˆ,...,ˆ{ 1 kTT }{}|}{{ˆ where ii TttT

June 2003 Yannis Tzitzikas et al., EJC'2003 13

Compound Terms and Compound Taxonomies

• Compound term: any subset s of T

• Compound terminology S : a set of compound terms

• Compound taxonomy: a pair (S, ) where – S is a compound terminology and

– 'such that '' iff '. ttststss

{Sports,Crete} {Sports},{Sports,Crete} {Sports,Greece}

SportsGreece

Crete

Example:

}'|'{)(

}'|'{)(.

.

ssssNr

ssssBr

. and let and ),( where},...,{et 11

i1 k

ii

k

iiiik TTTFFFFL

June 2003 Yannis Tzitzikas et al., EJC'2003 14

The Product Operation

} | ... {... 11 iinn SsssSS

{Greece}

{Islands}

{Sports}

{SeaSports}{Greece,Sports}

{Islands,Sports} {Greece,SeaSorts}

{Islands,SeaSorts}

'SS

{Greece}

{Islands}

{Sports}

{SeaSports}

S S’

} '' , | ' {' SsSsssSS

June 2003 Yannis Tzitzikas et al., EJC'2003 15

The Plus-Product Operation

{Greece}

{Islands}

{Sports}

{SeaSports}

S S’

{WinterSports}

{SnowSki}

{Greece}

{Islands}

{Sports}

{SeaSports}{Greece,Sports}

{Islands,Sports} {Greece,SeaSports}

{Islands,SeaSports}

)',( SSP

{WinterSports}

{Greece,WinterSports} {SnowSki}

{Greece,SnowSki}

P={{Islands,SeaSports}, {Greece,SnowSki}}

)( ... ),...,( 11 PBrSSSS nnP nSSGP ,...,1

June 2003 Yannis Tzitzikas et al., EJC'2003 16

The Minus-Product Operation

)( ... ),...,( 11 NNrSSSS nnN Θ nSSGN ,...,1

{Greece}

{Islands}

{Sports}

{SeaSports}

S S’

{Greece}

{Islands}

{Sports}

{SeaSports}{Greece,Sports}

{Islands,Sports}{Greece,SeaSports}

{Islands,SeaSports}

)',( SSNΘ

{WinterSports}

{SnowSki}

{WinterSports}

{Greece,WinterSports}{SnowSki}

{Greece,SnowSki}

N={{Islands, WinterSports}}

June 2003 Yannis Tzitzikas et al., EJC'2003 17

The Self-[Plus/Minus]-Product Operations

iiiT TTG )(*iTGN

iTGP

)P()(*ii TT

Self-Product

)()(* PBrTT iiP

Self-Plus-Product

)()()( ** NNrTT iiN Θ

Self-Minus-Product

June 2003 Yannis Tzitzikas et al., EJC'2003 18

The Self-Plus-Product: Example

{Sports}

{SeaSports}

S )(* SP

{WinterSports}

{SnowSki}

P={{SeaSki,WindSurfing}, {SnowSki, SnowBoard}}

{SeaSki}{Windsurfing} {SnowBoard}

{Sports}

{SeaSports} {WinterSports}

{SnowSki}{SeaSki}{Windsurfing} {SnowBoard}

{SeaSki,WindSurfing} {SnowSki,SnowBoard}

June 2003 Yannis Tzitzikas et al., EJC'2003 19

Expressions and Well-formed Expressions

An expression e is well-formed if:

(a) each basic compound terminology appears at most once in e,

(b) the parameters P/N are subsets of the corresponding genuine compound terms

In this way:

• no conflicts arise

• monotonic behavior

The set of expressions over a facet set {F1,…, Fk} is defined according to the grammar:

iiNiPNP TTTeeeee | | | ),...,( | ),...,(:: ** ΘΘ

June 2003 Yannis Tzitzikas et al., EJC'2003 20

Example: Building the catalog of a tourist portal

Location

IraklionOutdoor

Ammoudara Hersonissos

Accommodation

Furn.Appartments

Rooms Bungalows

Facilities

Jacuzzi SwimmingPool

Indoor

3 facets, 13 terms, 890 compound terms from which only 96 are valid

)Facilitiesion,AccommodatLocation,(P

P = {{Iraklio, Furn.Appartments}, {Iraklio,Rooms}, {Ammoudara, Furn. Appartments}, {Ammoudara,Rooms}, {Hersonisson, Furn.Apartments}, {Ammoudara, Bungalows, Jacuzzi}, {Hersonissos,Rooms,Indoor}, {Hersonissos, Bungalows,Outdoor} } |P|=8

Facilitiesion)Accommodat Location( PΝ ΘN = {{Iraklio, Bungalows}},P = { {Hersonisson, Rooms, Indoor}, {Hersonissos, Bungalows,Outdoor}, {Ammoudara,Bungalows,Jacuzzi} } |P|+|N|=4

June 2003 Yannis Tzitzikas et al., EJC'2003 21

Checking the Validity of a Compound Term

We provide an algorithm for checking whether s Se without having to compute (and store) the entire Se.

|)NP|*|T| ( 3 OThe time complexity for this algorithm is:

Let Se be the compound terminology defined by an algebraic expression e.

=> Only F and e have to be stored

June 2003 Yannis Tzitzikas et al., EJC'2003 22

Generating Navigation Trees

Objective: Given an expression e generate dynamically a navigation tree

with nodes that correspond to valid compound terms only

for using it during object indexing and browsing

The navigation tree also contains nodes for facet crossing

Sports

SeaSports

WinterSports

byLocationIslands

Mainland

Crete

Pilio

OlympusbyLocation Mainland

Pilio

byLocation

Islands

Mainland

Crete

Olympus

Pilio

bySports SeaSports

WinterSports

bySports SeaSports byLocation Crete

bySports

bySportsSeaSports

WinterSports

Location

June 2003 Yannis Tzitzikas et al., EJC'2003 23

Application in Web Catalogues

big,

incomplete,

scalability problems

Taxonomies of existing catalogs

P|N Navigation Trees

dynamically

small,

clear,

scalable

Faceted Taxonomies + Algebra

June 2003 Yannis Tzitzikas et al., EJC'2003 24

Prototype Implementation using a RDBMS

Three tables are used for storing the faceted taxonomy and the expression e.

TERMSname id

SUBSUMPTIONterm1 term2

PARAMETERSF1 F2 Fk...

ExpressionBuilder

Storage Manager

ValidityChecker

Nav. TreeGenerator

RDBMS

Designer Indexer/UserArchitecture

June 2003 Yannis Tzitzikas et al., EJC'2003 25

Concluding Remarks

Faceted Taxonomies :

[+] conceptual clarity (it is easier to understand)

[+] compactness (it takes less space)

[+] scalability (the update operations can be formulated easier and be performed more efficiently)

[-] invalid compound terms may appear.

The Proposed Algebra :

[+] provides a solution to the problem of invalid compound terms

[+] Aids indexing and browsing (and prevents errors)