An Algebraic Approach for Specifying Compound Terms in Faceted Taxonomies Yannis Tzitzikas 1...
-
Upload
theodore-hamilton -
Category
Documents
-
view
214 -
download
1
Transcript of An Algebraic Approach for Specifying Compound Terms in Faceted Taxonomies Yannis Tzitzikas 1...
An Algebraic Approach for Specifying Compound Terms in Faceted Taxonomies
Yannis Tzitzikas 1
Anastasia Analyti 2
Nicolas Spyratos 3
Panos Constantopoulos 2,4
1 Instituto di Scienza e Technologie dell’Informazione CNR-ISTI,Italy
2 Institute of Computer Science, ICS-FORTH, Greece
3 Laboratoire de Recherche en Informatique, Universite de Paris-Sud, France
4 Department of Computer Science, University of Crete, Greece
June 2003 Yannis Tzitzikas et al., EJC'2003 2
Outline of the presentation
• Introduction - Motivation
• Faceted Classification and Faceted Taxonomies
– Advantages and Problems
• Compound Terms and Compound Taxonomies
• The Algebra
– Operations
– Examples
– Algorithms
– Deriving Navigational Trees
– Prototype implementation
• Concluding Remarks
June 2003 Yannis Tzitzikas et al., EJC'2003 3
Introduction
• Existing ways to locate information in the Web– searching (using search engines like Google)
– browsing (using catalogues like Yahoo!, ODP)
• Currently, the catalogues are also exploited by the search engines:– for improving the measuring of relevance
– for giving to the user a set of related pages to each page of the answer
– for limiting the scope of the search
• Web Catalogues (or indices using controlled structured vocabularies):[-]: index only a subset of the pages that are indexed by search engines
[+]: ensure indexing consistency
[+]: enable intelligent reasoning
[+]: enable browsing
June 2003 Yannis Tzitzikas et al., EJC'2003 4
Drawbacks of the taxonomies that are used by Web Catalogues
• Hard to understand
• Laborious browsing
• Laborious object indexing
• Hard to update/revise
• Large storage requirements
(1) Big size (e.g. currently Open Directory has 460.000 terms)
(2) Inconsistent and incomplete terminology and structuring
USER
DESIGNER
June 2003 Yannis Tzitzikas et al., EJC'2003 5
Faceted Classification and Faceted Taxonomies
Faceted classification was developed, prior to the existence of computers,by S. R. Ranganathan (1892-1972), a Hindu mathematician working as a librarian.
Key point: Faceted taxonomies do not require an a priori division of concepts into subconcepts (only relationships between elemental concepts are stored)
* A faceted taxonomy consists of a set of facets
* Each facet is a group of elemental concepts
* Each object is indexed by synthesizing elemental concepts
Advantages of faceted taxonomies:
• they are easier to build and understand
• they require less storage space requirements
• they are more scalable
June 2003 Yannis Tzitzikas et al., EJC'2003 6
Faceted Taxonomies
Sports
SeaSports WinterSports
Location
Islands Mainland
Crete Pilio Olympus
1F 2F
.facetTF
FFFfaceted
ii
k
a called is ),( each which
in s taxonomieof },...,{set a by defined is taxonomy A
i
1
e) transitivand (reflexive over relation a :
or names ofset a i.e. , a:
Tnsubsumptio
termsyterminologT
),(pair a is An T taxonomy
June 2003 Yannis Tzitzikas et al., EJC'2003 7
Example of using one taxonomy
1 billion pages
blocks of 10 pages
100 million indexing terms
Completeand balanceddecimal tree
Total: 111,111,111 terms
June 2003 Yannis Tzitzikas et al., EJC'2003 8
Example of using a faceted taxonomy consisting of 4 facets
1 billion pages
blocks of 10 pages
100 million indexing terms
Total: 444 terms
100 terms 100 terms 100 terms 100 termsx x x 400 terms
June 2003 Yannis Tzitzikas et al., EJC'2003 9
Example of using a faceted taxonomy consisting of 8 facets
1 billion pages
blocks of 10 pages
100 million indexing terms
Total: 88 terms!… …
10 terms 10 termsx … x 80 terms…
June 2003 Yannis Tzitzikas et al., EJC'2003 10
Sports
SeaSports WinterSports
Location
Islands Mainland
Crete Pilio Olympus
1F 2F
The Problem of Faceted Taxonomies
Consequences:
• laborious/erroneous object indexing
• difficulties in browsing
Invalid compound terms may appear during
object indexing or browsing/retrieval
A compound term is invalid if it cannot be applied to any object of the domain
June 2003 Yannis Tzitzikas et al., EJC'2003 11
Valid and Invalid Compound Terms
Valid Invalid
Terms Compound
Sports
SeaSports WinterSports
Location
Islands Mainland
Crete Pilio Olympus
F
SeaSports.Olympus
WinterSports.Islands
WinterSports.Crete
Invalid Compound TermsSports.Location,
Sports.Islands
Sports.Crete
Sports.Mainland
Sports.Pilio
Sports.Olymous
SeaSports.Location,
SeaSports.Islands
SeaSports.Crete
SeaSports.Mainland
SeaSports.Pilio
WinterSports.Location,
WinterSports.Mainland
WinterSports.Pilio
WinterSports.Olympus
Valid Compound Terms
Example:
June 2003 Yannis Tzitzikas et al., EJC'2003 12
The Idea
Define an algebra with operators that allow specifying the set of valid compound terms without having to enumerate all of the valid compound terms.
Operations:
unaryCombines terms from one facet plus negative modifiersself-minus-product
unaryCombines terms from one facet plus positive modifiersself-plus-product
unaryCombines terms from one facetself-product
n-aryCombines terms from different facets plus negative modifiersminus-product
n-aryCombines terms from different facets plus positive modifiersplus-product
n-aryCombines terms from different facetsproduct
P
*P
NΘ
*
*NΘ
Initial Operands: Facet terminologies: }ˆ,...,ˆ{ 1 kTT }{}|}{{ˆ where ii TttT
June 2003 Yannis Tzitzikas et al., EJC'2003 13
Compound Terms and Compound Taxonomies
• Compound term: any subset s of T
• Compound terminology S : a set of compound terms
• Compound taxonomy: a pair (S, ) where – S is a compound terminology and
– 'such that '' iff '. ttststss
{Sports,Crete} {Sports},{Sports,Crete} {Sports,Greece}
SportsGreece
Crete
Example:
}'|'{)(
}'|'{)(.
.
ssssNr
ssssBr
. and let and ),( where},...,{et 11
i1 k
ii
k
iiiik TTTFFFFL
June 2003 Yannis Tzitzikas et al., EJC'2003 14
The Product Operation
} | ... {... 11 iinn SsssSS
{Greece}
{Islands}
{Sports}
{SeaSports}{Greece,Sports}
{Islands,Sports} {Greece,SeaSorts}
{Islands,SeaSorts}
'SS
{Greece}
{Islands}
{Sports}
{SeaSports}
S S’
} '' , | ' {' SsSsssSS
June 2003 Yannis Tzitzikas et al., EJC'2003 15
The Plus-Product Operation
{Greece}
{Islands}
{Sports}
{SeaSports}
S S’
{WinterSports}
{SnowSki}
{Greece}
{Islands}
{Sports}
{SeaSports}{Greece,Sports}
{Islands,Sports} {Greece,SeaSports}
{Islands,SeaSports}
)',( SSP
{WinterSports}
{Greece,WinterSports} {SnowSki}
{Greece,SnowSki}
P={{Islands,SeaSports}, {Greece,SnowSki}}
)( ... ),...,( 11 PBrSSSS nnP nSSGP ,...,1
June 2003 Yannis Tzitzikas et al., EJC'2003 16
The Minus-Product Operation
)( ... ),...,( 11 NNrSSSS nnN Θ nSSGN ,...,1
{Greece}
{Islands}
{Sports}
{SeaSports}
S S’
{Greece}
{Islands}
{Sports}
{SeaSports}{Greece,Sports}
{Islands,Sports}{Greece,SeaSports}
{Islands,SeaSports}
)',( SSNΘ
{WinterSports}
{SnowSki}
{WinterSports}
{Greece,WinterSports}{SnowSki}
{Greece,SnowSki}
N={{Islands, WinterSports}}
June 2003 Yannis Tzitzikas et al., EJC'2003 17
The Self-[Plus/Minus]-Product Operations
iiiT TTG )(*iTGN
iTGP
)P()(*ii TT
Self-Product
)()(* PBrTT iiP
Self-Plus-Product
)()()( ** NNrTT iiN Θ
Self-Minus-Product
June 2003 Yannis Tzitzikas et al., EJC'2003 18
The Self-Plus-Product: Example
{Sports}
{SeaSports}
S )(* SP
{WinterSports}
{SnowSki}
P={{SeaSki,WindSurfing}, {SnowSki, SnowBoard}}
{SeaSki}{Windsurfing} {SnowBoard}
{Sports}
{SeaSports} {WinterSports}
{SnowSki}{SeaSki}{Windsurfing} {SnowBoard}
{SeaSki,WindSurfing} {SnowSki,SnowBoard}
June 2003 Yannis Tzitzikas et al., EJC'2003 19
Expressions and Well-formed Expressions
An expression e is well-formed if:
(a) each basic compound terminology appears at most once in e,
(b) the parameters P/N are subsets of the corresponding genuine compound terms
In this way:
• no conflicts arise
• monotonic behavior
The set of expressions over a facet set {F1,…, Fk} is defined according to the grammar:
iiNiPNP TTTeeeee | | | ),...,( | ),...,(:: ** ΘΘ
June 2003 Yannis Tzitzikas et al., EJC'2003 20
Example: Building the catalog of a tourist portal
Location
IraklionOutdoor
Ammoudara Hersonissos
Accommodation
Furn.Appartments
Rooms Bungalows
Facilities
Jacuzzi SwimmingPool
Indoor
3 facets, 13 terms, 890 compound terms from which only 96 are valid
)Facilitiesion,AccommodatLocation,(P
P = {{Iraklio, Furn.Appartments}, {Iraklio,Rooms}, {Ammoudara, Furn. Appartments}, {Ammoudara,Rooms}, {Hersonisson, Furn.Apartments}, {Ammoudara, Bungalows, Jacuzzi}, {Hersonissos,Rooms,Indoor}, {Hersonissos, Bungalows,Outdoor} } |P|=8
Facilitiesion)Accommodat Location( PΝ ΘN = {{Iraklio, Bungalows}},P = { {Hersonisson, Rooms, Indoor}, {Hersonissos, Bungalows,Outdoor}, {Ammoudara,Bungalows,Jacuzzi} } |P|+|N|=4
June 2003 Yannis Tzitzikas et al., EJC'2003 21
Checking the Validity of a Compound Term
We provide an algorithm for checking whether s Se without having to compute (and store) the entire Se.
|)NP|*|T| ( 3 OThe time complexity for this algorithm is:
Let Se be the compound terminology defined by an algebraic expression e.
=> Only F and e have to be stored
June 2003 Yannis Tzitzikas et al., EJC'2003 22
Generating Navigation Trees
Objective: Given an expression e generate dynamically a navigation tree
with nodes that correspond to valid compound terms only
for using it during object indexing and browsing
The navigation tree also contains nodes for facet crossing
Sports
SeaSports
WinterSports
byLocationIslands
Mainland
Crete
Pilio
OlympusbyLocation Mainland
Pilio
byLocation
Islands
Mainland
Crete
Olympus
Pilio
bySports SeaSports
WinterSports
bySports SeaSports byLocation Crete
bySports
bySportsSeaSports
WinterSports
Location
June 2003 Yannis Tzitzikas et al., EJC'2003 23
Application in Web Catalogues
big,
incomplete,
scalability problems
Taxonomies of existing catalogs
P|N Navigation Trees
dynamically
small,
clear,
scalable
Faceted Taxonomies + Algebra
June 2003 Yannis Tzitzikas et al., EJC'2003 24
Prototype Implementation using a RDBMS
Three tables are used for storing the faceted taxonomy and the expression e.
TERMSname id
SUBSUMPTIONterm1 term2
PARAMETERSF1 F2 Fk...
ExpressionBuilder
Storage Manager
ValidityChecker
Nav. TreeGenerator
RDBMS
Designer Indexer/UserArchitecture
June 2003 Yannis Tzitzikas et al., EJC'2003 25
Concluding Remarks
Faceted Taxonomies :
[+] conceptual clarity (it is easier to understand)
[+] compactness (it takes less space)
[+] scalability (the update operations can be formulated easier and be performed more efficiently)
[-] invalid compound terms may appear.
The Proposed Algebra :
[+] provides a solution to the problem of invalid compound terms
[+] Aids indexing and browsing (and prevents errors)