COMP 630L Paper Presentation Presenter: Le Jianwei [email protected].

42
COMP 630L COMP 630L Paper Presentation Paper Presentation Presenter: Le Jia Presenter: Le Jia nwei nwei [email protected] [email protected]
  • date post

    22-Dec-2015
  • Category

    Documents

  • view

    227
  • download

    0

Transcript of COMP 630L Paper Presentation Presenter: Le Jianwei [email protected].

Page 1: COMP 630L Paper Presentation Presenter: Le Jianwei jianwei@cse.ust.hk.

COMP 630LCOMP 630LPaper PresentationPaper Presentation

Presenter: Le JianweiPresenter: Le [email protected]@cse.ust.hk

Page 2: COMP 630L Paper Presentation Presenter: Le Jianwei jianwei@cse.ust.hk.

Presentation Presentation PaperPaper

Personalizing XML Search in PIMENPersonalizing XML Search in PIMENTOTO

Amer-Yahia, S. Fundulaki, I. Lakshmanan, L.V.Amer-Yahia, S. Fundulaki, I. Lakshmanan, L.V.S.S.

ICDEICDE 2007 2007IEEE 23rd International Conference on Data EngineeringIEEE 23rd International Conference on Data Engineering

Istanbul, Turkey, Istanbul, Turkey, April 2007April 2007

Page 3: COMP 630L Paper Presentation Presenter: Le Jianwei jianwei@cse.ust.hk.

OutlinesOutlines Authors & BackgroundAuthors & Background Main ContributionMain Contribution The need of PersonalizingThe need of Personalizing Novelty of this paperNovelty of this paper Class of Queries and User Profile Scoping Rules (SRs) Ordering Rules (ORs) Detect and resolve conflicting SRs.Detect and resolve conflicting SRs. Detect and resolve ambiguous ORs.Detect and resolve ambiguous ORs. ExperimentsExperiments

Page 4: COMP 630L Paper Presentation Presenter: Le Jianwei jianwei@cse.ust.hk.

Authors & Authors & BackgroundBackground

Sihem Amer-YahiaSihem Amer-Yahia Yahoo! Research. USAYahoo! Research. USA

Irini FundulakiIrini Fundulaki University of Edinburgh. UKUniversity of Edinburgh. UKLaks V.S. LakshmananLaks V.S. Lakshmanan

UBC. Canada.UBC. Canada.

PIMENTO projectPIMENTO project : A project that aims at improving the : A project that aims at improving the relevance of searching relevance of searching structured structured andand unstructured contents. unstructured contents.

Page 5: COMP 630L Paper Presentation Presenter: Le Jianwei jianwei@cse.ust.hk.

Main ContributionMain Contribution11 Formalizing Formalizing user profilesuser profiles in terms of in terms of scoping rules (SRs)scoping rules (SRs) and and ordord

ering rules (ORs)ering rules (ORs) and defining and defining query personalizationquery personalization as the pro as the process of rewriting a user query using SRs and ranking query anscess of rewriting a user query using SRs and ranking query answers using wers using ORsORs..

22 Describing an Describing an methodsmethods to detect and resolve to detect and resolve conflicting SRsconflicting SRs an and d ambiguous ORsambiguous ORs..

Page 6: COMP 630L Paper Presentation Presenter: Le Jianwei jianwei@cse.ust.hk.

The need of PersonalizingThe need of Personalizing None of the existing XML search leverages None of the existing XML search leverages user user

informationinformation to determine relevant query answers. to determine relevant query answers.

Example 1:Example 1: a painter who searches for “ a painter who searches for “Black paintBlack paint” ” would receive the same results as a home builder.would receive the same results as a home builder.

Example 2:Example 2: a user looking for a used car would a user looking for a used car would receive the same listing regardless from his car receive the same listing regardless from his car preferences (preferences (color, make, mileagecolor, make, mileage).).

Page 7: COMP 630L Paper Presentation Presenter: Le Jianwei jianwei@cse.ust.hk.

Novelty of this paperNovelty of this paper

This work is the first attempt to apply query This work is the first attempt to apply query personalization to XML search.personalization to XML search.

In this paper, we take the first necessary steps In this paper, we take the first necessary steps to achieve this goal by to achieve this goal by modeling user profiles modeling user profiles and enforcing them efficiently and effectively and enforcing them efficiently and effectively in XML search.in XML search.

Page 8: COMP 630L Paper Presentation Presenter: Le Jianwei jianwei@cse.ust.hk.

Query personalization basicsQuery personalization basics In web search, the ranking of query answers may be modified In web search, the ranking of query answers may be modified

based on the user’s navigational behavior during a session.based on the user’s navigational behavior during a session.

Query personalization through user profiles has different Query personalization through user profiles has different aspects that restrict or expand its applicability.aspects that restrict or expand its applicability.

Enforcing a user profile ranges from simply modifying the Enforcing a user profile ranges from simply modifying the original ranking of query answers, to returning a substantially original ranking of query answers, to returning a substantially different set of answers.different set of answers.

Page 9: COMP 630L Paper Presentation Presenter: Le Jianwei jianwei@cse.ust.hk.

Example: cars for saleExample: cars for sale

Information on each car may include the manufacturing Information on each car may include the manufacturing date, owner information, price, horsepower, make, and date, owner information, price, horsepower, make, and color (color (structured contentstructured content).).

Page 10: COMP 630L Paper Presentation Presenter: Le Jianwei jianwei@cse.ust.hk.

Example: cars for saleExample: cars for sale A user interested in buying a car which is:A user interested in buying a car which is:

1.1. In a good conditionIn a good condition2.2. Low mileageLow mileage3.3. Costs less than $2000 Costs less than $2000

he can formulate the XQuery Full-text query:he can formulate the XQuery Full-text query:

Q: //car[./description[ftcontairns(., "good condition") & ftcontains(., "low mileage")] & ./price < 2000]

Page 11: COMP 630L Paper Presentation Presenter: Le Jianwei jianwei@cse.ust.hk.

Example: cars for saleExample: cars for sale The answer to this query may be ranked by their relevance to The answer to this query may be ranked by their relevance to

""good conditiongood condition" and to "" and to "low mileagelow mileage".". However, there are extra conditions:However, there are extra conditions:

Example 1: Example 1: A user located in New York has a preference for A user located in New York has a preference for red cars, no matter the underlying query engine use what red cars, no matter the underlying query engine use what scoring function, it’s natural to expect that the user should scoring function, it’s natural to expect that the user should receive red cars in New York ranked higher than other cars. receive red cars in New York ranked higher than other cars.

Example 2:Example 2: A user wants to buy a car which is located in a A user wants to buy a car which is located in a different state provided it has a higher horsepower than cars different state provided it has a higher horsepower than cars in New York.in New York.

Page 12: COMP 630L Paper Presentation Presenter: Le Jianwei jianwei@cse.ust.hk.

Example 1:Example 1:

Example 2:Example 2:

Both the 2 users love red cars.Both the 2 users love red cars.

I am in NY !I am in NY !

I am in NY too, butI am in NY too, but

I love Powerful Cars !I love Powerful Cars !

Search result 1 :Search result 1 :11 Red cars in New York Red cars in New York22 Red cars outside New York Red cars outside New York…………

Search result 2 :Search result 2 :11 Red cars outside New York, but with Red cars outside New York, but with

larger horsepower !larger horsepower !22 Red cars in New York Red cars in New York…………

Page 13: COMP 630L Paper Presentation Presenter: Le Jianwei jianwei@cse.ust.hk.

ConclusionConclusion

1. The process of query personalization may either 1. The process of query personalization may either expandexpand or or restrictrestrict the original set of query answers the original set of query answers

2. Some ranking preferences may be 2. Some ranking preferences may be enforcedenforced when returning when returning query results.query results.

Page 14: COMP 630L Paper Presentation Presenter: Le Jianwei jianwei@cse.ust.hk.

Our IdeaOur Idea In our formalization, a user profile is composed of 2 kinds of In our formalization, a user profile is composed of 2 kinds of

preference rules:preference rules:

1. Scoping rules (SRs):1. Scoping rules (SRs): used to used to expandexpand or or restrictrestrict the original res the original result.ult.

expandexpand : e.g., I am willing to drop the low mileage requirement. : e.g., I am willing to drop the low mileage requirement.restrictrestrict : e.g., I only want to see cars for sale located in my area. : e.g., I only want to see cars for sale located in my area.

2. Ordering rules (ORs):2. Ordering rules (ORs): used to used to enforceenforce ranking preferences . ranking preferences .enforceenforce : e.g., I prefer red cars or cars with higher horsepower. : e.g., I prefer red cars or cars with higher horsepower.

Page 15: COMP 630L Paper Presentation Presenter: Le Jianwei jianwei@cse.ust.hk.

Class of Queries and User Profile

The The XQueryXQuery Full-Text family of languages augment keyword s Full-Text family of languages augment keyword search with two components:earch with two components:

(i) (i) full-text predicatesfull-text predicates such as proximity and order between ke such as proximity and order between keywordsywords

(ii) (ii) path conditionspath conditions which narrow the search scope. which narrow the search scope.

Page 16: COMP 630L Paper Presentation Presenter: Le Jianwei jianwei@cse.ust.hk.

Modeling a user profile We model a user profile using two orthogonal and We model a user profile using two orthogonal and

complementary components. complementary components.

First, we use First, we use Scoping Rules (SR)Scoping Rules (SR) to let the user change the to let the user change the scope of her query by broadening/narrowing the search by scope of her query by broadening/narrowing the search by relaxing / tightening query predicates.relaxing / tightening query predicates.

Second, we use Second, we use Ordering Rules (OR)Ordering Rules (OR) to specify how to rank to specify how to rank answers obtained from the system.answers obtained from the system.

Page 17: COMP 630L Paper Presentation Presenter: Le Jianwei jianwei@cse.ust.hk.

This paper definesThis paper defines query personalization query personalization as the process of as the process of rewriting a user rewriting a user query query usingusing SRs SRs and and ranking query answers ranking query answers usingusing ORs ORs..

Page 18: COMP 630L Paper Presentation Presenter: Le Jianwei jianwei@cse.ust.hk.

Scoping Rules

There are two kinds of SRs (see Fig. 2).There are two kinds of SRs (see Fig. 2).

1.1. Narrowing the search is accomplished by Narrowing the search is accomplished by addadd rules which re rules which restrict the user query by adding predicates.strict the user query by adding predicates.

2.2. Broadening the search is accomplished by either Broadening the search is accomplished by either deletedelete rule rules which remove existing query predicates or by s which remove existing query predicates or by replacereplace rule rules which replace exist ing query predicates by weaker ones.s which replace exist ing query predicates by weaker ones.

Page 19: COMP 630L Paper Presentation Presenter: Le Jianwei jianwei@cse.ust.hk.

Scoping RulesAn An add/deleteadd/delete rule is of the form: rule is of the form:

if if ((conditioncondition) ) thenthen ( (actionaction, , conclusionconclusion) ) where:where:(i)(i) conditioncondition is either a conjunction of structural and is either a conjunction of structural and

constraint predicates, or the value constraint predicates, or the value true.true.(ii)(ii) actionaction is one of is one of addadd and and deletedelete . . (iii)(iii) conclusionconclusion is a conjunction of structural and constraint is a conjunction of structural and constraint

predicates.predicates.

if a user query Q subsumes the if a user query Q subsumes the conditioncondition of a rule, then of a rule, then apply the apply the conclusionconclusion of that rule to the query as specified of that rule to the query as specified by by actionaction..

Page 20: COMP 630L Paper Presentation Presenter: Le Jianwei jianwei@cse.ust.hk.

Scoping RulesA A replacereplace rule is of the form: rule is of the form:

ifif ( (conditioncondition) ) thenthen replacereplace E E withwith E‘ E‘where:where:(i)(i) conditioncondition is either a conjunction of structural and is either a conjunction of structural and

constraint predicates, or the value true;constraint predicates, or the value true;(ii)(ii) EE and and E‘E‘ are conjunctions of predicates. are conjunctions of predicates.

if a user query Q subsumes the if a user query Q subsumes the conditioncondition of the rule, then of the rule, then replacereplace E, if present in the query, with E'. E, if present in the query, with E'.

Page 21: COMP 630L Paper Presentation Presenter: Le Jianwei jianwei@cse.ust.hk.

Example: SRExample: SR

if if ((conditioncondition) ) thenthen ( (actionaction, , conclusionconclusion))

Page 22: COMP 630L Paper Presentation Presenter: Le Jianwei jianwei@cse.ust.hk.

Ordering Rules: VORORsORs are of two kinds: are of two kinds: A A value-based OR (VOR)value-based OR (VOR) specifies that a user might prefer those answers s specifies that a user might prefer those answers s

atisfying a specific property to other answers, where the property is the vatisfying a specific property to other answers, where the property is the value of an attribute/element.alue of an attribute/element.

Where:Where: 1. C is a conjunction of conditions on x and y that equate their common pr

operties (we call C the common conditions), and c is a constant. 2. relOp is one of the relops {<, >} 3. prefRel is a binary relation on the domain of x.attr (y.attr) which is a stri

ct partial order.

Page 23: COMP 630L Paper Presentation Presenter: Le Jianwei jianwei@cse.ust.hk.

ExampleExample

For example: : common condition C on x and y is that x and y are both c

ars with the same make. relOp is one of the relops {<, >} (e.g., x.mileage < y.mileage)

3

Page 24: COMP 630L Paper Presentation Presenter: Le Jianwei jianwei@cse.ust.hk.

Ordering Rules: KORThe general form of The general form of keyword-based ORs (KOR)keyword-based ORs (KOR) is is

where where CC are the common conditions as before. are the common conditions as before.It says between answers x and y, x is preferred to y provided it c

ontains an occurrence of the keyword k.

Page 25: COMP 630L Paper Presentation Presenter: Le Jianwei jianwei@cse.ust.hk.

ExampleExample

says that among all cars, the user prefers those that contain an occurrence of "best bid“

says that among all cars, the user prefers those that contain an occurrence of "NYC".

4

5

4

Page 26: COMP 630L Paper Presentation Presenter: Le Jianwei jianwei@cse.ust.hk.

Answer Ranking Each query answer acquires a score based on its query match. It also gets Each query answer acquires a score based on its query match. It also gets

a KOR score based on any KORs in the user profile. a KOR score based on any KORs in the user profile. The VORs in the user profile may impose an ordering on the query answerThe VORs in the user profile may impose an ordering on the query answer

s independently of the above two.s independently of the above two. Question: How are we then to order answers?Question: How are we then to order answers?

We consider two possibilities.We consider two possibilities. 1. The order 1. The order K, V, SK, V, S indicates that we order answers by their KOR score indicates that we order answers by their KOR score

s first and then based on the VOR preferences. When two answers tie s first and then based on the VOR preferences. When two answers tie on their KOR score and their VOR properties, we order them by query on their KOR score and their VOR properties, we order them by query score.score.

2. The other order we consider is 2. The other order we consider is V, K, S.V, K, S.

Page 27: COMP 630L Paper Presentation Presenter: Le Jianwei jianwei@cse.ust.hk.

Problems StudiedProblems Studied Our goal is to assist the user in enhancing her query answerin

g experience in searching XML documents. We have proposed two complementary components for confi

guring an effective user profile:

(i) the scoping rules (SRs) (ii) the value-based and keyword-based ordering rules (ORs). However, there are problems with this method:

Page 28: COMP 630L Paper Presentation Presenter: Le Jianwei jianwei@cse.ust.hk.

Two Problems of this methodTwo Problems of this method1.1. Given a set of SRs and a query Q, the intended effect of SRs is that rules in Given a set of SRs and a query Q, the intended effect of SRs is that rules in

should be used to rewrite Q before it is evaluated. should be used to rewrite Q before it is evaluated. If there are few or no answers satisfying the rewritten query using If there are few or no answers satisfying the rewritten query using SRSR

ss, we should still consider answers satisfying the original query. Thus, query an, we should still consider answers satisfying the original query. Thus, query answering w.r.t. a set of SRs really entails evaluating a swering w.r.t. a set of SRs really entails evaluating a flockflock of related queries. of related queries.

Problem: how to pin down (Rank) this query flock exactly?Problem: how to pin down (Rank) this query flock exactly?

2.2. the the value-based ORsvalue-based ORs may sometimes result in may sometimes result in ambiguityambiguity. .

Problem: How to reduce Problem: How to reduce ambiguityambiguity ? ?

E.g. There is a pair of answers x, y such that x is preferable to y according to some E.g. There is a pair of answers x, y such that x is preferable to y according to some ORs and y is preferable to x according to some others.ORs and y is preferable to x according to some others.

Page 29: COMP 630L Paper Presentation Presenter: Le Jianwei jianwei@cse.ust.hk.

Scoping Rules and Query FlocksScoping Rules and Query Flocks1. The main issue that arises in rewriting a query w.r.t. a set of 1. The main issue that arises in rewriting a query w.r.t. a set of SRsSRs is tha is tha

t one rule's application may render another rule t one rule's application may render another rule inapplicableinapplicable.. We say that a rule p is We say that a rule p is applicableapplicable to a query Q if the condition in p is s to a query Q if the condition in p is s

ubsumed by Q.ubsumed by Q. E.g. Different order of applying SRs to a query can result in different reE.g. Different order of applying SRs to a query can result in different re

written queries.written queries.

2.A second issue that can arise is that a rule may "2.A second issue that can arise is that a rule may "conflictconflict" with another." with another.For an For an SR pSR p and a and a query Qquery Q, we denote by , we denote by p(Qp(Q) the result of applyin) the result of applying p to Q. Given a set of SRs and a query Q, we say a rule p1 g p to Q. Given a set of SRs and a query Q, we say a rule p1 conflictconflictss with p2 w.r.t. Q provided: with p2 w.r.t. Q provided:

(i)(i) both pI, p2 are applicable to Q,both pI, p2 are applicable to Q,(ii)(ii) p2 is not applicable to p1(Q).p2 is not applicable to p1(Q).

Page 30: COMP 630L Paper Presentation Presenter: Le Jianwei jianwei@cse.ust.hk.

Example: Conflict SRsExample: Conflict SRs

Both p1 and p2 are applicable to the query as it subsumes their conditions.

P1 conflicts with P2 since P2 is not applicable to the result of applying p1 to the query.

Page 31: COMP 630L Paper Presentation Presenter: Le Jianwei jianwei@cse.ust.hk.

Detect and resolve conflicting SRs.Detect and resolve conflicting SRs. Conflict among SRs can be captured using a directed

graph where each node is an SR. There is an arc (pi, pj) iff pi conflicts with pj. If this conflict graph is acyclic, then we can topologic

ally sort the nodes and apply the SRs to a query in the topological sort order.

Page 32: COMP 630L Paper Presentation Presenter: Le Jianwei jianwei@cse.ust.hk.

Detect and resolve conflicting SRs.Detect and resolve conflicting SRs. However, there may be cycles in the conflict graph.

p1 and p3 conflict with each other. To mitigate this problem, we require the user to assign priorities to rules. Given that different order of rule application may result in different

rewritten queries, it's important for the user to have a say in which order is used.

Page 33: COMP 630L Paper Presentation Presenter: Le Jianwei jianwei@cse.ust.hk.

Detect and resolve conflicting SRs.Detect and resolve conflicting SRs. Rule priorities resolve the problem of conflict cycles by forcing a specific or

der of rule application. We assume either the set of SRs is conflict-free or that there is a user assig

ned priority forcing a specific order of rule application. Given a query Q and a set of SRs , possibly together with rule priorities, t

he query flock associated with Q and consists of the family of queries:

Q,pI(Q),p2(pI(Q)), ...P.P..,(Pn-I( (pI(Q)),

where we assume that the order of rule application imposed by the priorities is P1, ..., Pn.

The idea is that all the queries in the query flock must be evaluated and answers ranked according to the ORs.

Page 34: COMP 630L Paper Presentation Presenter: Le Jianwei jianwei@cse.ust.hk.

Detect and resolve ambiguous ORs.Detect and resolve ambiguous ORs. Consider the value-based ORs in the figure ex

pressing preferences among cars. The rules appear quite reasonable.

2

1,c d

However, consider a pair of cars c, d such that c has color red, but a higher mileage than d. Then according to , while according to .

1

2 ,d c

1 2,

Page 35: COMP 630L Paper Presentation Presenter: Le Jianwei jianwei@cse.ust.hk.

Detect and resolve ambiguous ORs.Detect and resolve ambiguous ORs. Thus, there are database instances where the intend

ed preference among elements is not clear. We consider such ORs "ambiguous".

It is important to detect whether a set of value-based ORs is ambiguous and if so have the user assign priorities to ORs to make them unambiguous.

Page 36: COMP 630L Paper Presentation Presenter: Le Jianwei jianwei@cse.ust.hk.

Detect and resolve ambiguous ORsDetect and resolve ambiguous ORs We discuss an method for dealing with ambiguity Denote a value-based OR as

where local (x) denotes constraints involving only variable x(x.tag = car, x.color = red).

comp(x, y) denotes constraints involving both x and y(e.g., x.mileage < y.mileage).

Page 37: COMP 630L Paper Presentation Presenter: Le Jianwei jianwei@cse.ust.hk.

Detect and resolve ambiguous ORsDetect and resolve ambiguous ORs Define a constraint graph G( ) for a set of value-based O

Rs as follows:1. Its nodes are the variables occurring in the rules . 2. Whenever x y appears on the right-hand-side of a rule, G

has an arc (X, y) labeled 3. Whenever x, y are variables appearing in different rules and

are compatible, G has an undirected edge {x, y} labeled =.

By an alternating cycle, we mean a cycle of the form (vI,… vk, vI), k > 2, k is an even number, such that:

(1) the edges (vi, vi+1) are directed arcs labeled for odd i(2) the edges {vi, vi+1} are undirected edges labeled = for eve

n i,(3) the edge (vk, vI) is undirected and is labeled =

Then we have the following result.

vOvO

vO

Page 38: COMP 630L Paper Presentation Presenter: Le Jianwei jianwei@cse.ust.hk.

Detect and resolve ambiguous ORsDetect and resolve ambiguous ORs Lemma (Ambiguity): Let be a set of value-based ORs and G( ) be the

associated constraint graph. Then is ambiguous iff G contains an alternating cycle.

So, Using a straightforward adaptation of depth-first search, we can readily detect ambiguity in time O(#edges).

Suppose a set of value-based ORs defined by a user is ambiguous. Then by assigning a priority to the rules, alternating cycles can be broken. (Example of Animation)

vO vO

vO

Page 39: COMP 630L Paper Presentation Presenter: Le Jianwei jianwei@cse.ust.hk.

Experiments PIMENTO as a collection of Java classes. We performed experiments on:

1.6GHz Pentium 4 PC 512MB RAM Fedora Core 3

We used the INEX document collection for an empirical evaluation, of our approach, and the XMark data to show performance results.

Page 40: COMP 630L Paper Presentation Presenter: Le Jianwei jianwei@cse.ust.hk.

Performance results

The experiment shows that PushtopKPrune Query time with The experiment shows that PushtopKPrune Query time with an increasing document size and an increasing number of KORan increasing document size and an increasing number of KORs (1 to 4). In particular, the difference in query response time s (1 to 4). In particular, the difference in query response time between 1MB document and a 5.7MB document is sub-linear.between 1MB document and a 5.7MB document is sub-linear.

Page 41: COMP 630L Paper Presentation Presenter: Le Jianwei jianwei@cse.ust.hk.

Summary We presented a novel approach to XML search that leverages us

er information to return more relevant query answers.

This approach is based on a formalization of user profiles in terms of scoping rules which are used to rewrite an input query, and of ordering rules which are combined with query scoring to customize the rank of query answer to specifics users.

Page 42: COMP 630L Paper Presentation Presenter: Le Jianwei jianwei@cse.ust.hk.

Thank you !Thank you !

Le JianweiLe JianweiVisGraphVisGraph Lab Lab

[email protected]@cse.ust.hk