Identifying Meaningful Return Information for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona...
-
date post
21-Dec-2015 -
Category
Documents
-
view
220 -
download
0
Transcript of Identifying Meaningful Return Information for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona...
Identifying Meaningful Return Identifying Meaningful Return InformationInformation
for XML Keyword Searchfor XML Keyword Search
Ziyang Liu, Yi ChenYi ChenArizona State University
SIGMOD 2007
Searching XML DataSearching XML Data
XQuery
for $x in doc(“DB.xml”)//player
$y in $x/name
where $y = “Mutombo”
return $x/position
Find the position of the player with name “Mutombo”
Keyword Search
Mutombo, position
team
founded stadiumplayers
player
name position nationality
CongocenterMutombo
division
1967 Toyota southwest
name
Rockets
league
team
…
… team
…
… player
…
Centerplayer
name position nationality
U.SguardWells
founded
1967
name
Rockets
SIGMOD 2007
How to identify meaningful return information? Inferring return clauses in XQuery Limited research has been done
Users or system administrators specify [Hristidis et al 03, Li et al 04] Whole document [Carmel et al 02] Subtree Return [Cohen et al 03, Guo et al 03, Xu et al 05] Path Return variants [Hristidis et al 06]
Challenges in XML Keyword Challenges in XML Keyword SearchSearch
How to select relevant keyword matches and connect them? Inferring for clauses (with variable bindings) and where clauses in
XQuery Have been much studied
XRank [Guo et al 03] XSEarch [Cohen et al 03] Meaningful LCA [Li et al 04] Smallest LCA[Xu et al 05]
XSeekXSeek: automatically and intelligently identifies return information
SIGMOD 2007
Selecting and Connecting Keyword Selecting and Connecting Keyword MatchesMatches
Identify relevant matches using variants of LCA concepts
[Cohen et al 03, Li et al 04, Xu et al 05]
Q1: Mutombo, position
team
founded stadiumplayers
player
name position nationality
CongocenterMutombo
division
1967 Toyota southwest
name
Rockets
league
team
…
… team
…
… player
…
Centerplayer
name position nationality
U.SguardWells
founded
1967
name
Rockets
SIGMOD 2007
Selecting and Connecting Keyword Selecting and Connecting Keyword MatchesMatches
Q1: Mutombo, position
team
founded stadiumplayers
player
name position nationality
CongocenterMutombo
division
1967 Toyota southwest
name
Rockets
league
team
…
… team
…
… player
…
Centerplayer
name position nationality
U.SguardWells
founded
1967
name
Rockets
Given relevant matches, what should be returned?
SIGMOD 2007
Example I: Subtree ReturnExample I: Subtree Return
Q1: Mutombo, position
team
founded stadiumplayers
player
name position nationality
CongocenterMutombo
division
1967 Toyota southwest
name
Rockets
league
team
…
… team
…
… player
…
Centerplayer
name position nationality
U.SguardWells
founded
1967
name
Rockets
Q2: Mutombo, center
SIGMOD 2007
Example I: Path ReturnExample I: Path Return
Q1: Mutombo, position
team
founded stadiumplayers
player
name position nationality
CongocenterMutombo
division
1967 Toyota southwest
name
Rockets
league
team
…
… team
…
… player
…
Centerplayer
name position nationality
U.SguardWells
founded
1967
name
Rockets
Q2: Mutombo, center
SIGMOD 2007
Example I: XSeekExample I: XSeek
Q1: Mutombo, position
team
founded stadiumplayers
player
name position nationality
CongocenterMutombo
division
1967 Toyota southwest
name
Rockets
league
team
…
… team
…
… player
…
Centerplayer
name position nationality
U.SguardWells
founded
1967
name
Rockets
Q2: Mutombo, center
SIGMOD 2007
Example II: Subtree Return, Path Example II: Subtree Return, Path ReturnReturn
Q3: Rockets
team
founded stadiumplayers
player
name position nationality
CongocenterMutombo
division
1967 Toyota southwest
name
Rockets
league
team
…
… team
…
… player
…
Centerplayer
name position nationality
U.SguardWells
founded
1967
name
Rockets
SIGMOD 2007
Example II: XSeekExample II: XSeek
Q3: Rockets
team
founded stadiumplayers
player
name position nationality
CongocenterMutombo
division
1967 Toyota southwest
name
Rockets
league
team
…
… team
…
… player
…
Centerplayer
name position nationality
U.SguardWells
founded
1967
name
Rockets
SIGMOD 2007
ContributionsContributions
XSeek: automatically infers meaningful return information for XML keyword Search No elicitation from users or system administrators is required No schema information is required
Inferring search semantics Analyzing XML data structure Analyzing keyword match pattern Determining search results based on node types and match
types
Efficient implementation of the search semantics
Experimental verification on effectiveness and efficiency
SIGMOD 2007
RoadmapRoadmap
Motivation
Inferring search semantics Analyzing keyword match patterns Analyzing XML data structure Identifying search results
XSeek architecture
Experiments
Conclusions
SIGMOD 2007
Analyzing Keyword Match Analyzing Keyword Match PatternsPatterns
Identifying search predicates and return nodes in keywords
Examples of keyword searches Q1: Mutombo, position
Q2: Mutombo, center
Q3: Rockets
Examples of structured queries SQL:select position from Player where name = “Mutombo”
XQuery:for $x in doc(“DB.xml”)//playerwhere $x/name = “Mutombo”return $x/position
Return Nodes Search Predicates
Return Nodes
Search Predicates
SIGMOD 2007
Analyzing XML Data Analyzing XML Data StructureStructure
Three types of data nodesEntity nodesAttribute nodesConnection nodes
Related work on identifying node types [Xu et al 06]
team
founded stadiumplayers
player
name position nationality
CongocenterMutombo
division
1967 Toyota southwest
name
Rockets
league
team
…
… team
…
… player
…
Centerplayer
name position nationality
U.SguardWells
founded
1967
name
Rockets
SIGMOD 2007
Identifying Search ResultsIdentifying Search Results
Search results consist of
Matches to search predicates This allows users to verify the relevance of search results
Matches to return nodes This is what the user is searching for Matches are output according to node types
Attribute node: display name, value Entity node: display name, attributes, optionally entity and
connection descendants Connection node: display name, optionally entity and connection
descendants
Nodes that connect these matches
SIGMOD 2007
A Search Result ExampleA Search Result Example
Q1: Mutombo, position
team
founded stadiumplayers
player
name position nationality
CongocenterMutombo
division
1967 Toyota southwest
name
Rockets
league
team
…
… team
…
… player
…
Centerplayer
name position nationality
U.SguardWells
founded
1967
name
Rockets
SIGMOD 2007
What if Return Nodes Are Absent?What if Return Nodes Are Absent?
Explicit return nodes: nodes that are explicitly identified in input keywords
Inferring implicit return nodes if no explicit return nodes in input keywords Users may be interested in general information of entities that are
relevant to the search Master entity: the lowest ancestor-or-self entity of the LCA node, or
the XML tree root Relevant entity: the entities on a path from a master entity to a
relevant keyword match, inclusively
SIGMOD 2007
Search with Implicit Return Search with Implicit Return Nodes (I)Nodes (I)
team
founded stadiumplayers
player
name position nationality
CongocenterMutombo
division
1967 Toyota southwest
name
Rockets
league
team
…
… team
…
… player
…
Centerplayer
name position nationality
U.SguardWells
founded
1967
name
Rockets
Q2: Mutombo, center
SIGMOD 2007
Search with Implicit Return Nodes Search with Implicit Return Nodes (II)(II)
Q3: Rockets
team
founded stadiumplayers
player
name position nationality
CongocenterMutombo
division
1967 Toyota southwest
name
Rockets
league
team
…
… team
…
… player
…
Centerplayer
name position nationality
U.SguardWells
founded
1967
name
Rockets
SIGMOD 2007
RoadmapRoadmap
Motivation
Inferring search semantics Analyzing keyword match patterns Analyzing XML data structure Identifying search results
XSeek architecture
Experiments
Conclusions
SIGMOD 2007
Data Analyzer
Architecture of XSeekArchitecture of XSeek
Index Builder
Keyword Matcher
Match Grouper
Keyword Analyzer
Return Node Recognizer
Result Generator
Indexes
Search Result
XML
Keywords
• Entities
• Attributes
• Connection nodes
• Search predicates
• Return nodes
• Explicit return nodes
• Implicit return nodes
SIGMOD 2007
Experimental SetupExperimental Setup
Compare the performance of XSeek Subtree Return Path Return
Measurements Search quality Speed Scalability
Data sets: Mondial, WSU, XMark benchmarkQuery sets: eight queries for each data set
SIGMOD 2007
Search Quality: PrecisionSearch Quality: Precision
Precision: measures the soundness of search results
XSeek in general has a precision as good as Path Return
0
20
40
60
80
100
QA1 QA2 QA3 QA4 QA5 QA6 QA7 QA8
Subtree Return Path Return XSeek
||||
returnreturnrelevantprecision
open auction, person257 seller, person179, buyer, price, date
SIGMOD 2007
Recall: measures the completeness of search results
XSeek in general has a recall as good as Subtree Return
Search Quality: RecallSearch Quality: Recall
0
20
40
60
80
100
QA1 QA2 QA3 QA4 QA5 QA6 QA7 QA8
Subtree Return Path Return XSeek
||||
relevantreturnrelevantrecall
SIGMOD 2007
F-Measure is a weighted harmonic mean of precision and recall
XSeek has the best F-Measure
Search Quality: F-MeasureSearch Quality: F-Measure
0
20
40
60
80
100
α=0.5 α=1.0 α=2.0
Subtree Return Path Return XSeek
recallprecisionrecallprecisionF
)1(
SIGMOD 2007
Speed: Benchmark DataSpeed: Benchmark Data
0
0.3
0.6
0.9
1.2
1.5
QA1 QA2 QA3 QA4 QA5 QA6 QA7 QA8
Tim
e (s
)
Subtree Return Path Return XSeek
2.0 4.23.7
seller, person179, buyer, price, date
person257, person133
SIGMOD 2007
ConclusionsConclusions
The first work that automatically infers meaningful return information for XML keyword search No elicitation from users or system administrators, no schema information is required
Analyzing keyword match patterns Search predicates Return nodes
Analyzing XML node types Entities Attributes Connection nodes
Identifying two types of return information Explicit return nodes Implicit return nodes
Outputting an XML node based on its match type and node type
Experiments verify XSeek’s effectiveness and efficiency
Thank You!Thank You!
Questions?Questions?
Welcome to visit XSeek demo in VLDB Welcome to visit XSeek demo in VLDB 0707