Detectors installation in the TAN at IR1 and IR5: status and planning
IR1 Introduction
-
Upload
mohamed-abdou -
Category
Documents
-
view
222 -
download
0
Transcript of IR1 Introduction
-
8/2/2019 IR1 Introduction
1/33
Salah Hammami
2008
Lecture 1: Introduction
CSC483: Information RetrievalCSC483: Information Retrieval
-
8/2/2019 IR1 Introduction
2/33
Lecture 1: Introduction 2
dimensions of the IR "problem:
functions of an IR system
components of an IR system
factors which optimize the IR process
examine current research issues in IR
explore examples of industrial IR applications
form a broad picture of the IR field
Course Goals
-
8/2/2019 IR1 Introduction
3/33
Lecture 1: Introduction 3
Modern Information RetrievalModern Information Retrievalby R. Baeza-Yates and B. Ribeiro-Neto, 2001
Information Retrieval: A SurveyInformation Retrieval: A Survey
by Ed Greengrass, 2000
Information DiscoveryInformation Discoveryby Theo van der Weide, 2001
Introduction to Information RetrievalIntroduction to Information Retrieval
by C. D. Manning, P.Raghavan and H. Schtze (in preparation)
Lecture slides & notes
Additional study material & Web links
Course Material
-
8/2/2019 IR1 Introduction
4/33
Lecture 1: Introduction 4
Course Lectures Overview
1. IR Models
2. IR Query
Languages &
Operations
6. Semantic in IR3. Searcher
Feedback
IR introduction IR research issues Applications of IR
http://tech.groups.yahoo.com/group/csc483/
4. Language
Modeling for IR
8. Multimedia IR
5. Search Engines
9. Structured
Content
-
8/2/2019 IR1 Introduction
5/33
Lecture 1: Introduction 5
IR Related Areas
Database Management
Library and Information Science
Artificial Intelligence
Natural Language Processing
Machine Learning
-
8/2/2019 IR1 Introduction
6/33
Lecture 1: Introduction 6
IR Related Areas
Database Management
structureddata in relational tables vs. free-form text well-defined queries in formal language (SQL)
Recent move towardsRecent move towardssemisemi--structuredstructureddata (XML)data (XML)
Library and Information Science
user aspects of IR categorization of human knowledge
citation analysis
bibliometrics (structure of information)
Recent work onRecent work ondigital librariesdigital libraries
Artificial Intelligence (AI)
Knowledge representation, reasoning, formalisms, e.g. first-orderpredicate logic, Bayesian networks
Recent work onRecent work on WebWebontologiesontologies andandIntelligent Information AgentsIntelligent Information Agents
-
8/2/2019 IR1 Introduction
7/33Lecture 1: Introduction 7
IR Related Areas
Natural Language Processing (NLP)
Syntactic, semantic, and pragmatic analysis of text & discourse
Retrieval based onmeaning rather than keywords
analyzing the syntax (phrase structure) and semantics
Determining sense of ambiguous words (context-based)
Identifying specific pieces of information in a document
Answering specific NL questions
Recent work inRecent work in GATE (general architecture for text engineeringGATE (general architecture for text engineering --http://gate.ac.uk/http://gate.ac.uk/))
Machine learning (ML)
computational systems - experienced-based improving of performance
automated classification of examples (supervised learning)
automated clustering of examples (unsupervised learning)
-
8/2/2019 IR1 Introduction
8/33Lecture 1: Introduction 8
IR is not databases
-
8/2/2019 IR1 Introduction
9/33Lecture 1: Introduction 9
increasing amount
of information
dynamic user demands
understand
manage
distributed information repositories
various information
complex user goals
customization
demand
speed
precision
Main task: Information retrievalMain task: Information retrieval
Current situation:
The Information Age
-
8/2/2019 IR1 Introduction
10/33Lecture 1: Introduction 10
Information Retrieval (IR) is the task of finding relevant textswithin a large amount ofunstructured data
Relevant = texts matching some specific criteria.
Examples of IR tasks: searching for emails from a given
person, searching for an event that occurred on a given date
using the Internet, etc.
Examples of IR systems: www search engines, specific searchengines (laws, medical documents), etc.
NB: Databases Management Systems (DBMS) are different
from IR systems (data stored in a DB are structured!)
Definition
Information Retrieval
-
8/2/2019 IR1 Introduction
11/33Lecture 1: Introduction 11
Goal of IR is to retrieve all and only the relevant documents in a
collection for a particular user with a particular need for information
Relevance is a central concept in IR theory
How does an IR system work when the collection is all documentsavailable on the Web?
Web search engines are stress-testing the traditional IR models
Information Retrieval
-
8/2/2019 IR1 Introduction
12/33Lecture 1: Introduction 12
The goal is to search large document collections (millions of documents) toretrieve small subsets relevant to the users information need
Examples are:
Internet search engines
Digital library catalogues
Some application areas within IR
Cross language retrieval
Speech/broadcast retrieval
Text categorization
Text summarization
Subject to objective testing and evaluation
hundreds of queries
millions of documents (the TREC set and conference)
Information Retrieval
-
8/2/2019 IR1 Introduction
13/33Lecture 1: Introduction 13
IR in general ...
IR discipline that deals with:
retrieval
representation
storage organization
access
ofstructured, semistructured, semi--structuredstructured and unstructured dataunstructured data
(information objects)
in response to queryquery (topic statement) structured (e.g. boolean expression)
unstructured (e.g. sentence, document)
Information Retrieval
-
8/2/2019 IR1 Introduction
14/33Lecture 1: Introduction 14
in other words
The process of applyingalgorithmsalgorithms over unstructured, semi-
structured or structureddatadata in order to satisfy a given
information (explicit) queryquery
Efficiency with respect to:
algorithms
query building
data organization/structure
Information Retrieval
-
8/2/2019 IR1 Introduction
15/33Lecture 1: Introduction 15
and in other words
DataData
AlgorithmAlgorithm QueryQuery
CMCMcontent modelcontent model
how to organize
what structures what data
optimal
what CMattributes
how to build what CM attributes
what attributes
what structure
what rules how to build
Information Retrieval
-
8/2/2019 IR1 Introduction
16/33Lecture 1: Introduction 16
Data retrieval
which docs contain a set of keywords? Well defined semantics
a single erroneous object implies failure!
Information retrieval
information about a subject or topic
semantics is frequently loose
small errors are tolerated
IR system interpret contents of information items
generate a ranking which reflects relevance
notion of relevance is most important
Data vs. Information Retrieval
-
8/2/2019 IR1 Introduction
17/33Lecture 1: Introduction 17
IR Systems
IR SystemIR SystemUserUser
QueryQueryRanked list ofRanked list of
documentsdocuments
interpret contents ofinterpret contents of
information objectsinformation objects
generate a rankinggenerate a ranking
which reflects relevancewhich reflects relevance
-
8/2/2019 IR1 Introduction
18/33Lecture 1: Introduction 18
IR System
disclosure for a collection OO ofnn information objects
user is interested in information objects
interest model as a partial order on the collection
a set of relevant
a set of irrelevant documents
produces a (total) ordering resembling the users interestcomparative
model to user info need
The Information Need has
qualitative and quantitative aspects
expressed in a query
Information Need
IR Systems
-
8/2/2019 IR1 Introduction
19/33Lecture 1: Introduction 19
Basic Concepts
i C
-
8/2/2019 IR1 Introduction
20/33Lecture 1: Introduction 20
Basic Concepts: The User Task
Pull actions User requests information in an interactive manner
Push actions
Software agents push the information towards the users
B i C
-
8/2/2019 IR1 Introduction
21/33Lecture 1: Introduction 21
single unit of informationsingle unit of information
typically text in a digital form other media
a complete logical unit (e.g. book, article)
a part of a larger text (e.g. passage, section, entry in a dictionary)
any physical unit (e.g. file, email, web page)
Document
Basic Concepts: The User Task
-
8/2/2019 IR1 Introduction
22/33Lecture 1: Introduction 22
The Standard Retrieval Interaction Model
-
8/2/2019 IR1 Introduction
23/33Lecture 1: Introduction 23
Standard Model of IR
Assumptions:
The goal is maximizing precision and recall simultaneously
The information need remains static
The value is in the resulting document set
-
8/2/2019 IR1 Introduction
24/33
Lecture 1: Introduction 24
Problems with Standard Model
Users learn during the search process:
Scanning titles of retrieved documents
Reading retrieved documents
Viewing lists of related topics/thesaurus terms Navigating hyperlinks
Some users dont like long (apparently) disorganized lists
of documents
-
8/2/2019 IR1 Introduction
25/33
Lecture 1: Introduction 25
IR is an Iterative Process
Repositories
Workspace
Goals
-
8/2/2019 IR1 Introduction
26/33
Lecture 1: Introduction 26
IR is a Dialog
The exchange doesnt end with first answer
Users can recognize elements of a useful answer, even whenincomplete
Questions and understanding changes as the process continues
-
8/2/2019 IR1 Introduction
27/33
Lecture 1: Introduction 27
Information Retrieval
Revised Task Statement:
Build a system that retrieves documents that users are likely to
find relevant to their queries
This set of assumptions underlies the field of Information
Retrieval
Th R t i l P
-
8/2/2019 IR1 Introduction
28/33
Lecture 1: Introduction 28
UserUserInterfaceInterface
Text OperationsText Operations
QueryQuery
OperationsOperations
IndexingIndexing
SearchingSearching
Ranking
IndexIndex
55
TextText
DatabaseDatabase
The Retrieval Process ...
logical view logical view
inverted filequery generated
retrieved docs
ranking docs
user feedback change the query
33
11
22
11text defines logical view
text
specifies user need
DB ManagerDB Manager
ModuleModule
44
66
77
88
99
1010
builds
Th R t i l P
-
8/2/2019 IR1 Introduction
29/33
Lecture 1: Introduction 29
DocumentsDocuments
Information NeedInformation Need
Index TermsIndex Terms
documentdocument
queryquery
rankingrankingmatchmatch
The Retrieval Process ...
Th R t i l P
-
8/2/2019 IR1 Introduction
30/33
Lecture 1: Introduction 30
MatchingMatchingindex terms is quite impreciseimprecise
UsersUsersget frequently unsatisfiedunsatisfied UsersUsershave no trainingno trainingin query formation
Frequent dissatisfaction of Web users
RelevanceRelevanceis critical for IR systems: rankingranking
OrderingOrderingretrieved documents reflects their relevancerelevance
to useruserqueryquery Fundamental premicespremicesfor relevance:
common sets of index terms sharing of weighted terms likelihood of relevance
Each set of premicesset of premicesleads to a distinct IR modelIR model
The Retrieval Process ...
IR T
-
8/2/2019 IR1 Introduction
31/33
Lecture 1: Introduction 31
NonNon--Overlapping ListsOverlapping Lists
Proximal NodesProximal Nodes
Structured ModelsStructured Models
Retrieval:Retrieval:
Adhoc
Filtering
BrowsingBrowsing
Classic ModelsClassic Models
BooleanBoolean
VectorVector
ProbabilisticProbabilistic
Set TheoreticSet TheoreticFuzzyFuzzy
Extended BooleanExtended Boolean
ProbabilisticProbabilistic
Inference NetworkInference Network
Belief NetworkBelief Network
AlgebraicAlgebraic
Generalized VectorGeneralized VectorLatent Semantic IndexLatent Semantic Index
Neural NetworksNeural Networks
BrowsingBrowsing
FlatFlat
Structure GuidedStructure Guided
HypertextHypertext
UserTask
UserTask
IR Taxonomy
IR T Ad H R i l
-
8/2/2019 IR1 Introduction
32/33
Lecture 1: Introduction 32
CollectionFixed Size
Q2
Q3
Q1
Q4Q5
collection remains relatively staticcollection remains relatively static
new queries arenew queries are
submitted to the systemsubmitted to the system
IR Taxonomy: Ad Hoc Retrieval
a person having a need for information
a set of information objects to satisfy the
need
models to formalize the information need
stable (fixed) info collection
user interest is valid during some period of time
query only expresses the information need at
some point in time
IR Taxonomy: Filt i
-
8/2/2019 IR1 Introduction
33/33
Documents Stream
User 1
Profile
User 2
Profile
Docs Filtered
for User 2
Docs for
User 1
Queries remain relatively staticQueries remain relatively static
New documents come into the systemNew documents come into the system
IR Taxonomy: Filtering
(continuous) stream of documents
e.g. newsgroups
decision for each document
no preprocessing of all documents