9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of...

66
9/4/2001 Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management and Systems SIMS 202: Information Organization and Retrieval Lecture authors: Marti Hearst & Ray Larson
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    222
  • download

    1

Transcript of 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of...

Page 1: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

Introduction to Information Retrieval

University of California, Berkeley

School of Information Management and Systems

SIMS 202: Information Organization and Retrieval

Lecture authors: Marti Hearst & Ray Larson

Page 2: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

Review: Information Overload

• “The world's total yearly production of print, film, optical, and magnetic content would require roughly 1.5 billion gigabytes of storage. This is the equivalent of 250 megabytes per person for each man, woman, and child on earth.” (Varian & Lyman)

• “The greatest problem of today is how to teach people to ignore the irrelevant, how to refuse to know things, before they are suffocated. For too many facts are as bad as none at all.” (W.H. Auden)

Page 3: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

Information Organization and Retrieval

• To organize is to (1) furnish with organs, make organic, make into living tissue, become organic; (2) form into an organic whole; give orderly structure to; frame and put into working order; make arrangements for.

• Knowledge is knowing, familiarity gained by experience; person’s range of information; a theoretical or practical understanding of; the sum of what is known.

• To retrieve is to (1) recover by investigation or effort of memory, restore to knowledge or recall to mind; regain possession of; (2) rescue from a bad state, revive, repair, set right.

• Information is (1) informing, telling; thing told, knowledge, items of knowledge, news.

The Oxford English Dictionary, cf. Rowley

Page 4: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

Information Life CycleCreation

Utilization Searching

Active

Inactive

Semi-Active

Retention/Mining

Disposition

Discard

Using Creating

AuthoringModifying

OrganizingIndexing

StoringRetrieval

DistributionNetworking

AccessingFiltering

Note: This version of the Life cycle is based on the report of a conference on the Social Aspects of Digital Libraries held at UCLA. - C. Borgman, PI

Page 5: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

Authoring/Modifying

• Converting Data+Information+Knowledge to New Information.

• Creating information from observation, thought.

• Editing and Publication.

• Gatekeeping

Page 6: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

Organizing/Indexing

• Collecting and Integrating information.

• Affects Data, Information and Metadata.

• “Metadata” Describes data and information.– More on this later.

• Organizing Information.– Types of organization?

• Indexing

Page 7: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

Storing/Retrieving

• Information Storage – How and Where is Information stored?

• Retrieving Information.– How is information recovered from storage– How to find needed information– Linked with Accessing/Filtering stage

Page 8: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

Distribution/Networking

• Transmission of information– How is information transmitted?

• Networks vs Broadcast.

Page 9: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

Accessing/Filtering

• Using the organization created in the O/I stage to:– Select desired (or relevant) information– Locate that information– Retrieve the information from its storage

location (often via a network)

Page 10: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

Using/Creating

• Using Information.

• Transformation of Information to Knowledge.

• Knowledge to New Data and New Information.

Page 11: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

Key issues in this course• How to find the appropriate information resources

or information-bearing objects for someone’s (or your own) needs.– Retrieving

• How to describe information resources or information-bearing objects in ways so that they may be effectively used by those who need to use them.– Organizing

Page 12: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

Key IssuesCreation

Utilization Searching

Active

Inactive

Semi-Active

Retention/Mining

Disposition

Discard

Using Creating

AuthoringModifying

OrganizingIndexing

StoringRetrieval

DistributionNetworking

AccessingFiltering

Page 13: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

This Week

• Introduction to IR– Modern IR textbook topics

• The Information Seeking Process

Page 14: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

Textbook Topics

Page 15: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

Mor

e D

etai

led

Vie

w

Page 16: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

Wha

t We’

ll C

over

A Lot

A Little

Page 17: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

Search and RetrievalOutline of Part I of SIMS 202

• The Search Process• Information Retrieval Models• Content Analysis/Zipf Distributions• Evaluation of IR Systems

– Precision/Recall– Relevance– User Studies

• System and Implementation Issues• Web-Specific Issues• User Interface Issues• Special Kinds of Search

Page 18: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

What is an Information Need?

Page 19: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

The Standard Retrieval Interaction Model

Page 20: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

Standard Model

• Assumptions:– Maximizing precision and recall

simultaneously– The information need remains static– The value is in the resulting document set

Page 21: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

Problem with Standard Model:

• Users learn during the search process:– Scanning titles of retrieved documents– Reading retrieved documents– Viewing lists of related topics/thesaurus terms– Navigating hyperlinks

• Some users don’t like long disorganized lists of documents

Page 22: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

IR is an Iterative Process

Repositories

Workspace

Goals

Page 23: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

IR is a Dialog

– The exchange doesn’t end with first answer

– User can recognize elements of a useful answer

– Questions and understanding changes as the process

continues.

Page 24: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

“Berry-Picking” as an Information Seeking Strategy (Bates 90)

• Standard IR model– assumes the information need remains the same

throughout the search process

• Berry-picking model– interesting information is scattered like berries

among bushes– the query is continually shifting

Page 25: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

A sketch of a searcher… “moving through many actions towards a general goal of satisfactory

completion of research related to an information need.” (after Bates 89)

Q0

Q1

Q2

Q3

Q4

Q5

Page 26: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

Berry-picking model (cont.)

• The query is continually shifting

• New information may yield new ideas and new directions

• The information need– is not satisfied by a single, final retrieved set– is satisfied by a series of selections and bits of

information found along the way.

Page 27: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

Berry-picking model (cont.)

• The query is continually shifting

• New information may yield new ideas and new directions

• The information need– is not satisfied by a single, final retrieved set– is satisfied by a series of selections and bits of

information found along the way.

Page 28: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

Information Seeking Behavior

• Two parts of a process:• search and retrieval

• analysis and synthesis of search results

• This is a fuzzy area; we will look at several different working theories.

Page 29: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

Search Tactics and Strategies

• Search Tactics– Bates 79

• Search Strategies– Bates 89– O’Day and Jeffries 93

Page 30: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

Tactics vs. Strategies

• Tactic: short term goals and maneuvers– operators, actions

• Strategy: overall planning– link a sequence of operators together to achieve

some end

Page 31: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

Information Search Tactics (after Bates 79)

• Monitoring tactics– keep search on track

• Source-level tactics– navigate to and within sources

• Term and Search Formulation tactics– designing search formulation

– selection and revision of specific terms within search formulation

Page 32: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

Term Tactics

• Move around the thesaurus– superordinate, subordinate, coordinate – neighbor (semantic or alphabetic)– trace -- pull out terms from information already

seen as part of search (titles, etc)– morphological and other spelling variants– antonyms (contrary)

Page 33: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

Source-level Tactics• “Bibble”:

– look for a pre-defined result set – e.g., a good link page on web

• Survey:– look ahead, review available options– e.g., don’t simply use the first term or first source that

comes to mind

• Cut:– eliminate large proportion of search domain– e.g., search on rarest term first

Page 34: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

Source-level Tactics (cont.)• Stretch

– use source in unintended way

– e.g., use patents to find addresses

• Scaffold– take an indirect route to goal

– e.g., when looking for references to obscure poet, look up contemporaries

• Cleave– binary search in an ordered file

Page 35: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

Monitoring Tactics(strategy-level)• Check

– compare original goal with current state

• Weigh– make a cost/benefit analysis of current or anticipated

actions

• Pattern– recognize common strategies

• Correct Errors• Record

– keep track of (incomplete) paths

Page 36: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

Additional Considerations(Bates 79)

• Add a Sort tactic!• More detail is needed about short-term

cost/benefit decision rule strategies• When to stop?

– How to judge when enough information has been gathered?

– How to decide when to give up an unsuccesful search?

– When to stop searching in one source and move to another?

Page 37: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

Implications

• Interfaces should make it easy to store intermediate results

• Interfaces should make it easy to follow trails with unanticipated results

• Makes evaluation more difficult.

Page 38: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

• Later in the course:– More on Search Process and Strategies– User interfaces to improve IR process– Incorporation of Content Analysis into better

systems

Page 39: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

Restricted Form of the IR Problem

• The system has available only pre-existing, “canned” text passages.

• Its response is limited to selecting from these passages and presenting them to the user.

• It must select, say, 10 or 20 passages out of millions or billions!

Page 40: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

Information Retrieval

• Revised Task Statement:

Build a system that retrieves documents that users are likely to find relevant to their queries.

• This set of assumptions underlies the field of Information Retrieval.

Page 41: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

Some IR History

– Roots in the scientific “Information Explosion” following WWII

– Interest in computer-based IR from mid 1950’s• H.P. Luhn at IBM (1958)

• Probabilistic models at Rand (Maron & Kuhns) (1960)

• Boolean system development at Lockheed (‘60s)

• Vector Space Model (Salton at Cornell 1965)

• Statistical Weighting methods and theoretical advances (‘70s)

• Refinements and Advances in application (‘80s)• User Interfaces, Large-scale testing and application (‘90s)

Page 42: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

Structure of an IR SystemSearchLine Interest profiles

& QueriesDocuments

& data

Rules of the game =Rules for subject indexing +

Thesaurus (which consists of

Lead-InVocabulary

andIndexing

Language

StorageLine

Potentially Relevant

Documents

Comparison/Matching

Store1: Profiles/Search requests

Store2: Documentrepresentations

Indexing (Descriptive and

Subject)

Formulating query in terms of

descriptors

Storage of profiles

Storage of Documents

Information Storage and Retrieval System

Adapted from Soergel, p. 19

Page 43: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

Structure of an IR SystemSearchLine Interest profiles

& QueriesDocuments

& data

Rules of the game =Rules for subject indexing +

Thesaurus (which consists of

Lead-InVocabulary

andIndexing

Language

StorageLine

Potentially Relevant

Documents

Comparison/Matching

Store1: Profiles/Search requests

Store2: Documentrepresentations

Indexing (Descriptive and

Subject)

Formulating query in terms of

descriptors

Storage of profiles

Storage of Documents

Information Storage and Retrieval System

Adapted from Soergel, p. 19

Page 44: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

Structure of an IR SystemSearchLine Interest profiles

& QueriesDocuments

& data

Rules of the game =Rules for subject indexing +

Thesaurus (which consists of

Lead-InVocabulary

andIndexing

Language

StorageLine

Potentially Relevant

Documents

Comparison/Matching

Store1: Profiles/Search requests

Store2: Documentrepresentations

Indexing (Descriptive and

Subject)

Formulating query in terms of

descriptors

Storage of profiles

Storage of Documents

Information Storage and Retrieval System

Adapted from Soergel, p. 19

Page 45: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

Structure of an IR SystemSearchLine Interest profiles

& QueriesDocuments

& data

Rules of the game =Rules for subject indexing +

Thesaurus (which consists of

Lead-InVocabulary

andIndexing

Language

StorageLine

Potentially Relevant

Documents

Comparison/Matching

Store1: Profiles/Search requests

Store2: Documentrepresentations

Indexing (Descriptive and

Subject)

Formulating query in terms of

descriptors

Storage of profiles

Storage of Documents

Information Storage and Retrieval System

Adapted from Soergel, p. 19

Page 46: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

Relevance (introduction)• In what ways can a document be relevant to a

query?– Answer precise question precisely.

– Who is buried in grant’s tomb? Grant.

– Partially answer question.– Where is Danville? Near Walnut Creek.

– Suggest a source for more information.– What is lymphodema? Look in this Medical Dictionary.

– Give background information.– Remind the user of other knowledge.– Others ...

Page 47: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

Query Languages

• A way to express the question (information need)

• Types: – Boolean– Natural Language– Stylized Natural Language– Form-Based (GUI)

Page 48: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

Simple query language: Boolean

– Terms + Connectors (or operators)– terms

• words• normalized (stemmed) words• phrases• thesaurus terms

– connectors• AND• OR• NOT

Page 49: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

Boolean Queries• Cat

• Cat OR Dog

• Cat AND Dog

• (Cat AND Dog)

• (Cat AND Dog) OR Collar

• (Cat AND Dog) OR (Collar AND Leash)

• (Cat OR Dog) AND (Collar OR Leash)

Page 50: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

Boolean Queries

• (Cat OR Dog) AND (Collar OR Leash)– Each of the following combinations works:

• Cat x x x x• Dog x x x x x• Collar x x x x• Leash x x x x

Page 51: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

Boolean Queries

• (Cat OR Dog) AND (Collar OR Leash)– None of the following combinations work:

• Cat x x

• Dog x x

• Collar x x

• Leash x x

Page 52: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

Boolean Logic

A B

BABA

BABA

BAC

BAC

AC

AC

:Law sDeMorgan'

Page 53: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

Boolean Queries– Usually expressed as INFIX operators in IR

• ((a AND b) OR (c AND b))

– NOT is UNARY PREFIX operator• ((a AND b) OR (c AND (NOT b)))

– AND and OR can be n-ary operators• (a AND b AND c AND d)

– Some rules - (De Morgan revisited)• NOT(a) AND NOT(b) = NOT(a OR b)• NOT(a) OR NOT(b)= NOT(a AND b)• NOT(NOT(a)) = a

Page 54: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

Boolean Logic

t33

t11 t22

D11D22

D33

D44D55

D66

D88D77

D99

D1010

D1111

m1

m2

m3m5

m4

m7m8

m6

m2 = t1 t2 t3

m1 = t1 t2 t3

m4 = t1 t2 t3

m3 = t1 t2 t3

m6 = t1 t2 t3

m5 = t1 t2 t3

m8 = t1 t2 t3

m7 = t1 t2 t3

Page 55: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

Boolean Searching“Measurement of thewidth of cracks in prestressedconcrete beams”

Formal Query:cracks AND beamsAND Width_measurementAND Prestressed_concrete

Cracks

Beams Widthmeasurement

Prestressedconcrete

Relaxed Query:(C AND B AND P) OR(C AND B AND W) OR(C AND W AND P) OR(B AND W AND P)

Page 56: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

Psuedo-Boolean Queries

• A new notation, from web search– +cat dog +collar leash

• Does not mean the same thing!

• Need a way to group combinations.

• Phrases:– “stray cat” AND “frayed collar”– +“stray cat” + “frayed collar”

Page 57: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

Informationneed

Index

Pre-process

Parse

Collections

Rank

Query

text input

Page 58: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

Result Sets• Run a query, get a result set• Two choices

– Reformulate query, run on entire collection

– Reformulate query, run on result set

• Example: Dialog query• (Redford AND Newman)• -> S1 1450 documents• (S1 AND Sundance)• ->S2 898 documents

Page 59: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

Informationneed

Index

Pre-process

Parse

Collections

Rank

Query

text input

Reformulated Query

Re-Rank

Page 60: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

Ordering of Retrieved Documents• Pure Boolean has no ordering• In practice:

– order chronologically– order by total number of “hits” on query terms

• What if one term has more hits than others?• Is it better to one of each term or many of one term?

• Fancier methods have been investigated – p-norm is most famous

• usually impractical to implement• usually hard for user to understand

Page 61: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

Boolean• Advantages

– simple queries are easy to understand– relatively easy to implement

• Disadvantages– difficult to specify what is wanted– too much returned, or too little– ordering not well determined

• Dominant language in commercial systems until the WWW

Page 62: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

Faceted Boolean Query

• Strategy: break query into facets (polysemous with earlier meaning of facets)

– conjunction of disjunctionsa1 OR a2 OR a3

b1 OR b2

c1 OR c2 OR c3 OR c4

– each facet expresses a topic“rain forest” OR jungle OR amazon

medicine OR remedy OR cure

Smith OR Zhou

AND

AND

Page 63: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

Faceted Boolean Query

• Query still fails if one facet missing

• Alternative: Coordination level ranking– Order results in terms of how many facets (disjuncts)

are satisfied

– Also called Quorum ranking, Overlap ranking, and Best Match

• Problem: Facets still undifferentiated

• Alternative: assign weights to facets

Page 64: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

Proximity Searches• Proximity: terms occur within K positions of one

another– pen w/5 paper

• A “Near” function can be more vague– near(pen, paper)

• Sometimes order can be specified• Also, Phrases and Collocations

– “United Nations” “Bill Clinton”

• Phrase Variants– “retrieval of information” “information retrieval”

Page 65: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

Filters

• Filters: Reduce set of candidate docs• Often specified simultaneous with query• Usually restrictions on metadata

– restrict by:• date range• internet domain (.edu .com .berkeley.edu)• author• size• limit number of documents returned

Page 66: 9/4/2001Information Organization and Retrieval Introduction to Information Retrieval University of California, Berkeley School of Information Management.

9/4/2001 Information Organization and Retrieval

Next

• Statistical Properties of Text

• Preparing information for search: Lexical analysis

• Introduction to the Vector Space model of IR.