The Boolean Model Simple model based on set theory Queries specified as boolean expressions...
-
Upload
bernice-howard -
Category
Documents
-
view
230 -
download
0
Transcript of The Boolean Model Simple model based on set theory Queries specified as boolean expressions...
![Page 1: The Boolean Model Simple model based on set theory Queries specified as boolean expressions –precise semantics –neat formalism –q = k a (k b k c.](https://reader035.fdocuments.net/reader035/viewer/2022062409/5697bfda1a28abf838cafeb9/html5/thumbnails/1.jpg)
The Boolean Model
•Simple model based on set theory
•Queries specified as boolean expressions – precise semantics– neat formalism– q = ka (kb kc)
•Terms are either present or absent. Thus, wij {0,1}
•Consider– q = ka (kb kc)– vec(qdnf) = (1,1,1) (1,1,0) (1,0,0)– vec(qcc) = (1,1,0) is a conjunctive component
•Each query can be transformed in DNF form
![Page 2: The Boolean Model Simple model based on set theory Queries specified as boolean expressions –precise semantics –neat formalism –q = k a (k b k c.](https://reader035.fdocuments.net/reader035/viewer/2022062409/5697bfda1a28abf838cafeb9/html5/thumbnails/2.jpg)
The Boolean Model
•q = ka (kb kc)
•sim(q,dj) = 1, if document satisfies the boolean query
0 otherwise
- no in-between, only 0 or 1
(1,1,1)
(1,0,0)(1,1,0)
Ka Kb
Kc
![Page 3: The Boolean Model Simple model based on set theory Queries specified as boolean expressions –precise semantics –neat formalism –q = k a (k b k c.](https://reader035.fdocuments.net/reader035/viewer/2022062409/5697bfda1a28abf838cafeb9/html5/thumbnails/3.jpg)
Exercise
D1 = “computer information retrieval”
D2 = “computer retrieval”
D3 = “information”
D4 = “computer information”
Q1 = “information retrieval”
Q2 = “information ¬computer”
![Page 4: The Boolean Model Simple model based on set theory Queries specified as boolean expressions –precise semantics –neat formalism –q = k a (k b k c.](https://reader035.fdocuments.net/reader035/viewer/2022062409/5697bfda1a28abf838cafeb9/html5/thumbnails/4.jpg)
Exercise
กำ��หนด Index term ของแต่ ละเอกำส�รD1 = {love, need, person, possess, understand}D2 = {heart, listen, love, practice, suffer}D3 = {compassion, love, mind, person, practice}D4 = {death, health, languor, life, suffer}D5 = {energy, love, nourish, practice, teach}
Q = {love ^ suffer}
![Page 5: The Boolean Model Simple model based on set theory Queries specified as boolean expressions –precise semantics –neat formalism –q = k a (k b k c.](https://reader035.fdocuments.net/reader035/viewer/2022062409/5697bfda1a28abf838cafeb9/html5/thumbnails/5.jpg)
Drawbacks of the Boolean Model
• Retrieval based on binary decision criteria with no notion of partial matching
• No ranking of the documents is provided (absence of a grading scale)
• Information need has to be translated into a Boolean expression which most users find awkward
• The Boolean queries formulated by the users are most often too simplistic
• As a consequence, the Boolean model frequently returns either too few or too many documents in response to a user query
![Page 6: The Boolean Model Simple model based on set theory Queries specified as boolean expressions –precise semantics –neat formalism –q = k a (k b k c.](https://reader035.fdocuments.net/reader035/viewer/2022062409/5697bfda1a28abf838cafeb9/html5/thumbnails/6.jpg)
Drawbacks of the Boolean Model
The Boolean model imposes a binary criterion for deciding relevance
The question of how to extend the Boolean model to accomodate partial matching and a ranking has attracted considerable attention in the past
Two extensions of boolean model:– Fuzzy Set Model
– Extended Boolean Model
![Page 7: The Boolean Model Simple model based on set theory Queries specified as boolean expressions –precise semantics –neat formalism –q = k a (k b k c.](https://reader035.fdocuments.net/reader035/viewer/2022062409/5697bfda1a28abf838cafeb9/html5/thumbnails/7.jpg)
Set Theoretic Models
7
![Page 8: The Boolean Model Simple model based on set theory Queries specified as boolean expressions –precise semantics –neat formalism –q = k a (k b k c.](https://reader035.fdocuments.net/reader035/viewer/2022062409/5697bfda1a28abf838cafeb9/html5/thumbnails/8.jpg)
IR Models
Non-Overlapping ListsProximal Nodes
Structured Models
Retrieval: Adhoc
Filtering
Browsing
U s e r
T a s k
Classic Models
Boolean Vector
Probabilistic
Set Theoretic
Fuzzy Extended Boolean
Probabilistic
Inference Network Belief Network
Algebraic
Generalized Vector Lat. Semantic Index
Neural Networks
Browsing
Flat Structure Guided
Hypertext 8
![Page 9: The Boolean Model Simple model based on set theory Queries specified as boolean expressions –precise semantics –neat formalism –q = k a (k b k c.](https://reader035.fdocuments.net/reader035/viewer/2022062409/5697bfda1a28abf838cafeb9/html5/thumbnails/9.jpg)
Set Theoretic Models
The Boolean model imposes a binary criterion for deciding relevance
The question of how to extend the Boolean model to accomodate partial matching and a ranking has attracted considerable attention in the past
Two set theoretic models for this: Fuzzy Set Model Extended Boolean Model
9
![Page 10: The Boolean Model Simple model based on set theory Queries specified as boolean expressions –precise semantics –neat formalism –q = k a (k b k c.](https://reader035.fdocuments.net/reader035/viewer/2022062409/5697bfda1a28abf838cafeb9/html5/thumbnails/10.jpg)
Fuzzy Set Model
Queries and docs represented by sets of index terms: matching is approximate from the start
This vagueness can be modeled using a fuzzy framework, as follows:with each term is associated a fuzzy seteach doc has a degree of membership in this fuzzy
set This interpretation provides the foundation for many
models for IR based on fuzzy theory In here, we discuss the model proposed by Ogawa,
Morita, and Kobayashi (1991)10
![Page 11: The Boolean Model Simple model based on set theory Queries specified as boolean expressions –precise semantics –neat formalism –q = k a (k b k c.](https://reader035.fdocuments.net/reader035/viewer/2022062409/5697bfda1a28abf838cafeb9/html5/thumbnails/11.jpg)
Fuzzy Set Theory
Framework for representing classes whose boundaries are not well defined
Key idea is to introduce the notion of a degree of membership associated with the elements of a set
This degree of membership varies from 0 to 1 and allows modeling the notion of marginal membership
Thus, membership is now a gradual notion, contrary to the crispy notion enforced by classic Boolean logic
11
![Page 12: The Boolean Model Simple model based on set theory Queries specified as boolean expressions –precise semantics –neat formalism –q = k a (k b k c.](https://reader035.fdocuments.net/reader035/viewer/2022062409/5697bfda1a28abf838cafeb9/html5/thumbnails/12.jpg)
12
Fuzzy Set Theory
Model A query term: a fuzzy set A document: degree of membership in this test Membership function
Associate membership function with the elements of the class
0: no membership in the test 1: full membership 0 ~1: marginal elements of the test
documents
![Page 13: The Boolean Model Simple model based on set theory Queries specified as boolean expressions –precise semantics –neat formalism –q = k a (k b k c.](https://reader035.fdocuments.net/reader035/viewer/2022062409/5697bfda1a28abf838cafeb9/html5/thumbnails/13.jpg)
Fuzzy Set Theory
A fuzzy subset A of a universe of discourse U is characterized by a membership function µA: U[0,1] which associates with each element u of U a number µA(u) in the interval [0,1]
– complement:
– union:
– intersection:
)(1)( uu AA
))(),(max()( uuu BABA
))(),(min()( uuu BABA
a class document collectionfor query term
13
![Page 14: The Boolean Model Simple model based on set theory Queries specified as boolean expressions –precise semantics –neat formalism –q = k a (k b k c.](https://reader035.fdocuments.net/reader035/viewer/2022062409/5697bfda1a28abf838cafeb9/html5/thumbnails/14.jpg)
Examples
Assume U={d1, d2, d3, d4, d5, d6} Let A and B be {d1, d2, d3} and {d2, d3, d4}, respectively. Assume A={d1:0.8, d2:0.7, d3:0.6, d4:0, d5:0, d6:0}
and B={d1:0, d2:0.6, d3:0.8, d4:0.9, d5:0, d6:0} = {d1:0.2, d2:0.3, d3:0.4, d4:1, d5:1, d6:1} =
{d1:0.8, d2:0.7, d3:0.8, d4:0.9, d5:0, d6:0} =
{d1:0, d2:0.6, d3:0.6, d4:0, d5:0, d6:0}
)(1)( uu AA ))(),(max()( uuu BABA
))(),(min()( uuu BABA
14
![Page 15: The Boolean Model Simple model based on set theory Queries specified as boolean expressions –precise semantics –neat formalism –q = k a (k b k c.](https://reader035.fdocuments.net/reader035/viewer/2022062409/5697bfda1a28abf838cafeb9/html5/thumbnails/15.jpg)
Fuzzy Information Retrieval
basic idea– Expand the set of index terms in the query with related
terms (from the thesaurus) such that additional relevant documents can be retrieved
– A thesaurus can be constructed by defining a term-term correlation matrix c whose rows and columns are associated to the index terms in the document collection
keyword connection matrix
15
![Page 16: The Boolean Model Simple model based on set theory Queries specified as boolean expressions –precise semantics –neat formalism –q = k a (k b k c.](https://reader035.fdocuments.net/reader035/viewer/2022062409/5697bfda1a28abf838cafeb9/html5/thumbnails/16.jpg)
Fuzzy Information Retrieval(Continued)
normalized correlation factor ci,l between two terms ki and kl (0~1)
In the fuzzy set associated to each index term ki, a document dj has a degree of membership µi,j
lili
lili nnn
nc
,
,,
)1(1 ,,
jdlk
liji c
where
ni is # of documents containing term ki
nl is # of documents containing term kl
ni,l is # of documents containing ki and kl
16
![Page 17: The Boolean Model Simple model based on set theory Queries specified as boolean expressions –precise semantics –neat formalism –q = k a (k b k c.](https://reader035.fdocuments.net/reader035/viewer/2022062409/5697bfda1a28abf838cafeb9/html5/thumbnails/17.jpg)
Fuzzy Information Retrieval(Continued)
physical meaning– A document dj belongs to the fuzzy set associated to the
term ki if its own terms are related to ki, i.e., i,j=1.
– If there is at least one index term kl of dj which is strongly related to the index ki, then i,j1.
ki is a good fuzzy index
– When all index terms of dj are only loosely related to ki, i,j0.
ki is not a good fuzzy index
17
![Page 18: The Boolean Model Simple model based on set theory Queries specified as boolean expressions –precise semantics –neat formalism –q = k a (k b k c.](https://reader035.fdocuments.net/reader035/viewer/2022062409/5697bfda1a28abf838cafeb9/html5/thumbnails/18.jpg)
Example
q = (ka (kb kc))= (ka kb kc) (ka kb kc) (ka kb kc)= cc1+cc2+cc3
Da
Db
Dc
cc3cc2
cc1
Da: the fuzzy set of documents associated to the index ka
djDa has a degree of membership a,j > a predefined threshold K
Da: the fuzzy set of documents associated to the index ka
(the negation of index term ka)
18
![Page 19: The Boolean Model Simple model based on set theory Queries specified as boolean expressions –precise semantics –neat formalism –q = k a (k b k c.](https://reader035.fdocuments.net/reader035/viewer/2022062409/5697bfda1a28abf838cafeb9/html5/thumbnails/19.jpg)
Example
))1)(1(1())1(1()1(1
)1(1
,,,,,,,,,
3
1,
,321,
jcjbjajcjbjajcjbja
ijicc
jccccccjq
Query q=ka (kb kc)
disjunctive normal form qdnf=(1,1,1) (1,1,0) (1,0,0)
(1) the degree of membership in a disjunctive fuzzy set is computed using an algebraic sum (instead of max function) more smoothly(2) the degree of membership in a conjunctive fuzzy set is computed
using an algebraic product (instead of min function)
Recall )(1)( uu AA
19
![Page 20: The Boolean Model Simple model based on set theory Queries specified as boolean expressions –precise semantics –neat formalism –q = k a (k b k c.](https://reader035.fdocuments.net/reader035/viewer/2022062409/5697bfda1a28abf838cafeb9/html5/thumbnails/20.jpg)
Fuzzy Set Model
– Q: “gold silver truck”D1: “Shipment of gold damaged in a fire”D2: “Delivery of silver arrived in a silver truck”D3: “Shipment of gold arrived in a truck”
– IDF (Select Keywords)
• a = in = of = 0 = log 3/3 arrived = gold = shipment = truck = 0.176 = log 3/2
damaged = delivery = fire = silver = 0.477 = log 3/1
– 8 Keywords (Dimensions) are selected
• arrived(1), damaged(2), delivery(3), fire(4), gold(5), silver(6), shipment(7), truck(8)
20
![Page 21: The Boolean Model Simple model based on set theory Queries specified as boolean expressions –precise semantics –neat formalism –q = k a (k b k c.](https://reader035.fdocuments.net/reader035/viewer/2022062409/5697bfda1a28abf838cafeb9/html5/thumbnails/21.jpg)
Fuzzy Set Model
21
![Page 22: The Boolean Model Simple model based on set theory Queries specified as boolean expressions –precise semantics –neat formalism –q = k a (k b k c.](https://reader035.fdocuments.net/reader035/viewer/2022062409/5697bfda1a28abf838cafeb9/html5/thumbnails/22.jpg)
Fuzzy Set Model
22
![Page 23: The Boolean Model Simple model based on set theory Queries specified as boolean expressions –precise semantics –neat formalism –q = k a (k b k c.](https://reader035.fdocuments.net/reader035/viewer/2022062409/5697bfda1a28abf838cafeb9/html5/thumbnails/23.jpg)
Fuzzy Set Model
Sim(q,d): Alternative 1
Sim(q,d3) > Sim(q,d2) > Sim(q,d1) Sim(q,d): Alternative 2
Sim(q,d3) > Sim(q,d2) > Sim(q,d1)23
![Page 24: The Boolean Model Simple model based on set theory Queries specified as boolean expressions –precise semantics –neat formalism –q = k a (k b k c.](https://reader035.fdocuments.net/reader035/viewer/2022062409/5697bfda1a28abf838cafeb9/html5/thumbnails/24.jpg)
Extended Boolean Model
• Disadvantages of “Boolean Model” :• No term weight is used
• Counterexample: query q=Kx AND Ky.
Documents containing just one term, e,g, Kx is considered as
irrelevant as another document containing none of these terms.
• The size of the output might be too large or too small
24
![Page 25: The Boolean Model Simple model based on set theory Queries specified as boolean expressions –precise semantics –neat formalism –q = k a (k b k c.](https://reader035.fdocuments.net/reader035/viewer/2022062409/5697bfda1a28abf838cafeb9/html5/thumbnails/25.jpg)
Extended Boolean Model
• The Extended Boolean model was introduced in 1983 by Salton, Fox, and Wu
• The idea is to make use of term weight as vector space model.
• Strategy: Combine Boolean query with vector space model.
• Why not just use Vector Space Model?• Advantages: It is easy for user to provide query.
25
![Page 26: The Boolean Model Simple model based on set theory Queries specified as boolean expressions –precise semantics –neat formalism –q = k a (k b k c.](https://reader035.fdocuments.net/reader035/viewer/2022062409/5697bfda1a28abf838cafeb9/html5/thumbnails/26.jpg)
Extended Boolean Model
• Each document is represented by a vector (similar to vector space model.)
• Remember the formula.• Query is in terms of Boolean formula.• How to rank the documents?
ii
xjxjx
idf
idffw
max*,,
26
![Page 27: The Boolean Model Simple model based on set theory Queries specified as boolean expressions –precise semantics –neat formalism –q = k a (k b k c.](https://reader035.fdocuments.net/reader035/viewer/2022062409/5697bfda1a28abf838cafeb9/html5/thumbnails/27.jpg)
Fig. Extended Boolean logic considering the space composed of two terms kx and ky only.
dj
dj +1dj +1
dj
kx and ky
kx or ky
( 0, 1) ( 0, 1)( 1, 1) ( 1, 1)
( 0, 0) ( 1, 0) ( 0, 0) ( 1, 0)
ky ky
kx kx27
![Page 28: The Boolean Model Simple model based on set theory Queries specified as boolean expressions –precise semantics –neat formalism –q = k a (k b k c.](https://reader035.fdocuments.net/reader035/viewer/2022062409/5697bfda1a28abf838cafeb9/html5/thumbnails/28.jpg)
Extended Boolean Model
• For query q=Kx or Ky, (0,0) is the point we try to avoid. Thus, we can use
to rank the documents• The bigger the better.
2),(
22 yxdqsim or
28
![Page 29: The Boolean Model Simple model based on set theory Queries specified as boolean expressions –precise semantics –neat formalism –q = k a (k b k c.](https://reader035.fdocuments.net/reader035/viewer/2022062409/5697bfda1a28abf838cafeb9/html5/thumbnails/29.jpg)
Extended Boolean Model
• For query q=Kx and Ky, (1,1) is the most desirable point.
• We use
to rank the documents.• The bigger the better.
2
1(1),(
))1(22
yxdqsim and
29
![Page 30: The Boolean Model Simple model based on set theory Queries specified as boolean expressions –precise semantics –neat formalism –q = k a (k b k c.](https://reader035.fdocuments.net/reader035/viewer/2022062409/5697bfda1a28abf838cafeb9/html5/thumbnails/30.jpg)
Extend the idea to m terms
• qor=k1 p k2 p … p Km
• qand=k1 p k2 p … p km
)...
( 21
/1
),(m
xxxp
m
pp p
jor dqsim
))1(...)1()1(
(1 21
/1
),(
m
xxx m
ppp
jand
p
dqsim
30
![Page 31: The Boolean Model Simple model based on set theory Queries specified as boolean expressions –precise semantics –neat formalism –q = k a (k b k c.](https://reader035.fdocuments.net/reader035/viewer/2022062409/5697bfda1a28abf838cafeb9/html5/thumbnails/31.jpg)
Properties
• The p norm as defined above enjoys a couple of interesting properties as follows. First, when p=1 it can be verified that
• Second, when p= it can be verified that
• Sim(qor,dj)=max(xi)
• Sim(qand,dj)=min(xi)
m
xxdqsimdqsim
mjandjor
...),(),(
1
31
![Page 32: The Boolean Model Simple model based on set theory Queries specified as boolean expressions –precise semantics –neat formalism –q = k a (k b k c.](https://reader035.fdocuments.net/reader035/viewer/2022062409/5697bfda1a28abf838cafeb9/html5/thumbnails/32.jpg)
Example
• For instance, consider the query q=(k1 k2) k3. The similarity sim(q,dj) between a document dj and this query is then computed as
• Any boolean can be expressed as a numeral formula.
)2
))(1((
321
/1/1
2
)1()1(),( x
pp p p
xxdqsim
pp
32