January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email:...

67
Information Retrieval: Course Introduction Pawan Goyal CSE, IITKGP January 4th, 2016 Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 1 / 23

Transcript of January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email:...

Page 1: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

Information Retrieval: Course Introduction

Pawan Goyal

CSE, IITKGP

January 4th, 2016

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 1 / 23

Page 2: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

Course Info

Course Website:http://cse.iitkgp.ac.in/~pawang/courses/IR16.htmlShared with Prof. Animesh Mukherjee

Meeting TimesRegular Hours:

I Monday - 17:00 - 18:00 (NR - 221)I Thursday - 17:00 - 18:00 (NR - 221)I Friday - 17:00 - 18:00 (NR - 221)

Office Hour:I Friday - 18:00 - 19:00 (CSE - 308)

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 2 / 23

Page 3: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

Course Info

Course Website:http://cse.iitkgp.ac.in/~pawang/courses/IR16.htmlShared with Prof. Animesh Mukherjee

Meeting TimesRegular Hours:

I Monday - 17:00 - 18:00 (NR - 221)I Thursday - 17:00 - 18:00 (NR - 221)I Friday - 17:00 - 18:00 (NR - 221)

Office Hour:I Friday - 18:00 - 19:00 (CSE - 308)

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 2 / 23

Page 4: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

Course Info

Course Website:http://cse.iitkgp.ac.in/~pawang/courses/IR16.htmlShared with Prof. Animesh Mukherjee

Meeting TimesRegular Hours:

I Monday - 17:00 - 18:00 (NR - 221)I Thursday - 17:00 - 18:00 (NR - 221)I Friday - 17:00 - 18:00 (NR - 221)

Office Hour:I Friday - 18:00 - 19:00 (CSE - 308)

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 2 / 23

Page 5: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

Course Info

My ContactEmail: [email protected]

Office: CSE - 308

Webpage: http://cse.iitkgp.ac.in/~pawang/

Teaching AssistantsAmrith Krishna

Koustav Rudra

Suman Kalyan Maity

Abhishek Sikchi

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 3 / 23

Page 6: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

Course Info

My ContactEmail: [email protected]

Office: CSE - 308

Webpage: http://cse.iitkgp.ac.in/~pawang/

Teaching AssistantsAmrith Krishna

Koustav Rudra

Suman Kalyan Maity

Abhishek Sikchi

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 3 / 23

Page 7: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

Books and Materials

Reference BooksChristopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze.2008. Introduction to Information Retrieval, Cambridge university press.

Lecture MaterialAdditional Readings

Lecture Slides

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 4 / 23

Page 8: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

Books and Materials

Reference BooksChristopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze.2008. Introduction to Information Retrieval, Cambridge university press.

Lecture MaterialAdditional Readings

Lecture Slides

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 4 / 23

Page 9: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

Course Evaluation Plan: Tentative

Mid-Sem : 25%

End-Sem : 45%

Term Project: 30%

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 5 / 23

Page 10: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

Course Evaluation Plan: Tentative

Mid-Sem : 25%

End-Sem : 45%

Term Project: 30%

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 5 / 23

Page 11: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

Course Evaluation Plan: Tentative

Mid-Sem : 25%

End-Sem : 45%

Term Project: 30%

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 5 / 23

Page 12: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

Course Evaluation Plan: Tentative

Mid-Sem : 25%

End-Sem : 45%

Term Project: 30%

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 5 / 23

Page 13: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

What is Information Retrieval?

Information Retrieval (IR) is finding material (usually documents) of anunstructured nature (usually text) that satisfies an information need from withinlarge collections.

What is a document?web pages, email, books, news stories, scholarly papers, text messages,Powerpoint, PDF, forum postings, patents, IM sessions, Tweets, questionanswer postings etc.

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 6 / 23

Page 14: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

What is Information Retrieval?

Information Retrieval (IR) is finding material (usually documents) of anunstructured nature (usually text) that satisfies an information need from withinlarge collections.

What is a document?

web pages, email, books, news stories, scholarly papers, text messages,Powerpoint, PDF, forum postings, patents, IM sessions, Tweets, questionanswer postings etc.

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 6 / 23

Page 15: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

What is Information Retrieval?

Information Retrieval (IR) is finding material (usually documents) of anunstructured nature (usually text) that satisfies an information need from withinlarge collections.

What is a document?web pages, email, books, news stories, scholarly papers, text messages,Powerpoint, PDF, forum postings, patents, IM sessions, Tweets, questionanswer postings etc.

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 6 / 23

Page 16: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

Document vs. Database Records

Database records (or tuples in relational databases) are typically madeup of well-defined fields (or attributes),

I e.g., bank records with account numbers, balances, names, addresses,social security numbers, dates of birth, etc.

Easy to compare fields with well-defined semantics to queries in order tofind matches

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 7 / 23

Page 17: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

Document vs. Database Records

Database records (or tuples in relational databases) are typically madeup of well-defined fields (or attributes),

I e.g., bank records with account numbers, balances, names, addresses,social security numbers, dates of birth, etc.

Easy to compare fields with well-defined semantics to queries in order tofind matches

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 7 / 23

Page 18: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

Document vs. Database Records

Database records (or tuples in relational databases) are typically madeup of well-defined fields (or attributes),

I e.g., bank records with account numbers, balances, names, addresses,social security numbers, dates of birth, etc.

Easy to compare fields with well-defined semantics to queries in order tofind matches

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 7 / 23

Page 19: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

Document vs. Database Records

Database records (or tuples in relational databases) are typically madeup of well-defined fields (or attributes),

I e.g., bank records with account numbers, balances, names, addresses,social security numbers, dates of birth, etc.

Easy to compare fields with well-defined semantics to queries in order tofind matches

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 7 / 23

Page 20: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

Document vs. Database Records

Example bank database query

Find records with balance > $50,000 in branches located in Amherst, MA.

Matches easily found by comparison with field values of records

Example search engine querybank scandals in western mass

This text must be compared to the text of entire news stories

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 8 / 23

Page 21: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

Document vs. Database Records

Example bank database query

Find records with balance > $50,000 in branches located in Amherst, MA.

Matches easily found by comparison with field values of records

Example search engine querybank scandals in western mass

This text must be compared to the text of entire news stories

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 8 / 23

Page 22: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

Document vs. Database Records

Example bank database query

Find records with balance > $50,000 in branches located in Amherst, MA.

Matches easily found by comparison with field values of records

Example search engine querybank scandals in western mass

This text must be compared to the text of entire news stories

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 8 / 23

Page 23: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

Document vs. Database Records

Example bank database query

Find records with balance > $50,000 in branches located in Amherst, MA.

Matches easily found by comparison with field values of records

Example search engine querybank scandals in western mass

This text must be compared to the text of entire news stories

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 8 / 23

Page 24: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

So, what do we do in IR?

The indexing and retrieval of textual documents.

Concerned firstly with retrieving relevant documents to a query.

Concerned secondly with retrieving from large sets of documentsefficiently.

What is the “killer” app?Searching for the pages on WWW

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 9 / 23

Page 25: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

So, what do we do in IR?

The indexing and retrieval of textual documents.

Concerned firstly with retrieving relevant documents to a query.

Concerned secondly with retrieving from large sets of documentsefficiently.

What is the “killer” app?Searching for the pages on WWW

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 9 / 23

Page 26: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

So, what do we do in IR?

The indexing and retrieval of textual documents.

Concerned firstly with retrieving relevant documents to a query.

Concerned secondly with retrieving from large sets of documentsefficiently.

What is the “killer” app?Searching for the pages on WWW

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 9 / 23

Page 27: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

So, what do we do in IR?

The indexing and retrieval of textual documents.

Concerned firstly with retrieving relevant documents to a query.

Concerned secondly with retrieving from large sets of documentsefficiently.

What is the “killer” app?Searching for the pages on WWW

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 9 / 23

Page 28: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

So, what do we do in IR?

The indexing and retrieval of textual documents.

Concerned firstly with retrieving relevant documents to a query.

Concerned secondly with retrieving from large sets of documentsefficiently.

What is the “killer” app?

Searching for the pages on WWW

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 9 / 23

Page 29: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

So, what do we do in IR?

The indexing and retrieval of textual documents.

Concerned firstly with retrieving relevant documents to a query.

Concerned secondly with retrieving from large sets of documentsefficiently.

What is the “killer” app?Searching for the pages on WWW

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 9 / 23

Page 30: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

Typical IR tasks

Given:

A corpus of textual natural-language documents.

A user query in the form of a textual string.

Find:A ranked set of documents that are relevant to the query.

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 10 / 23

Page 31: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

Typical IR tasks

Given:A corpus of textual natural-language documents.

A user query in the form of a textual string.

Find:A ranked set of documents that are relevant to the query.

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 10 / 23

Page 32: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

Typical IR tasks

Given:A corpus of textual natural-language documents.

A user query in the form of a textual string.

Find:A ranked set of documents that are relevant to the query.

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 10 / 23

Page 33: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

Typical IR tasks

Given:A corpus of textual natural-language documents.

A user query in the form of a textual string.

Find:

A ranked set of documents that are relevant to the query.

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 10 / 23

Page 34: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

Typical IR tasks

Given:A corpus of textual natural-language documents.

A user query in the form of a textual string.

Find:A ranked set of documents that are relevant to the query.

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 10 / 23

Page 35: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

IR System

The system should be able to retrieve the relevant docs efficiently

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 11 / 23

Page 36: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

So, what is relevance?

Relevant document contains the information that a person was looking forwhen they submitted the query. This may include:

Being on the proper subject.

Being timely (recent information).

Being authoritative (from a trusted source).

Satisfying the goals of the user and his/her intended use of theinformation (information need).

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 12 / 23

Page 37: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

So, what is relevance?

Relevant document contains the information that a person was looking forwhen they submitted the query. This may include:

Being on the proper subject.

Being timely (recent information).

Being authoritative (from a trusted source).

Satisfying the goals of the user and his/her intended use of theinformation (information need).

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 12 / 23

Page 38: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

So, what is relevance?

Relevant document contains the information that a person was looking forwhen they submitted the query. This may include:

Being on the proper subject.

Being timely (recent information).

Being authoritative (from a trusted source).

Satisfying the goals of the user and his/her intended use of theinformation (information need).

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 12 / 23

Page 39: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

So, what is relevance?

Relevant document contains the information that a person was looking forwhen they submitted the query. This may include:

Being on the proper subject.

Being timely (recent information).

Being authoritative (from a trusted source).

Satisfying the goals of the user and his/her intended use of theinformation (information need).

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 12 / 23

Page 40: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

So, what is relevance?

Relevant document contains the information that a person was looking forwhen they submitted the query. This may include:

Being on the proper subject.

Being timely (recent information).

Being authoritative (from a trusted source).

Satisfying the goals of the user and his/her intended use of theinformation (information need).

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 12 / 23

Page 41: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

Simplest notion of Relevance from Retrieval Models’Perspective

Keyword SearchSimplest notion of relevance is that the query string appears verbatim inthe document.

Slightly less strict notion is that (most of) the words in the query appearfrequently in the document, in any order (bag of words).

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 13 / 23

Page 42: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

Simplest notion of Relevance from Retrieval Models’Perspective

Keyword Search

Simplest notion of relevance is that the query string appears verbatim inthe document.

Slightly less strict notion is that (most of) the words in the query appearfrequently in the document, in any order (bag of words).

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 13 / 23

Page 43: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

Simplest notion of Relevance from Retrieval Models’Perspective

Keyword SearchSimplest notion of relevance is that the query string appears verbatim inthe document.

Slightly less strict notion is that (most of) the words in the query appearfrequently in the document, in any order (bag of words).

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 13 / 23

Page 44: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

Simplest notion of Relevance from Retrieval Models’Perspective

Keyword SearchSimplest notion of relevance is that the query string appears verbatim inthe document.

Slightly less strict notion is that (most of) the words in the query appearfrequently in the document, in any order (bag of words).

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 13 / 23

Page 45: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

Problems with Keywords Search

Term mismatchMay not retrieve relevant documents that include synonymous terms

PRC vs. China

car vs. automobile

AmbiguityMay retrieve irrelevant document that include ambiguous terms (due topolysemy)

‘Apple’ (company vs. fruit)

‘Java’ (programming language vs. Island)

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 14 / 23

Page 46: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

Problems with Keywords Search

Term mismatchMay not retrieve relevant documents that include synonymous terms

PRC vs. China

car vs. automobile

AmbiguityMay retrieve irrelevant document that include ambiguous terms (due topolysemy)

‘Apple’ (company vs. fruit)

‘Java’ (programming language vs. Island)

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 14 / 23

Page 47: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

Problems with Keywords Search

Term mismatchMay not retrieve relevant documents that include synonymous terms

PRC vs. China

car vs. automobile

AmbiguityMay retrieve irrelevant document that include ambiguous terms (due topolysemy)

‘Apple’ (company vs. fruit)

‘Java’ (programming language vs. Island)

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 14 / 23

Page 48: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

Problems with Keywords Search

Term mismatchMay not retrieve relevant documents that include synonymous terms

PRC vs. China

car vs. automobile

AmbiguityMay retrieve irrelevant document that include ambiguous terms (due topolysemy)

‘Apple’ (company vs. fruit)

‘Java’ (programming language vs. Island)

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 14 / 23

Page 49: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

Problems with Keywords Search

Term mismatchMay not retrieve relevant documents that include synonymous terms

PRC vs. China

car vs. automobile

AmbiguityMay retrieve irrelevant document that include ambiguous terms (due topolysemy)

‘Apple’ (company vs. fruit)

‘Java’ (programming language vs. Island)

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 14 / 23

Page 50: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

Problems with Keywords Search

Term mismatchMay not retrieve relevant documents that include synonymous terms

PRC vs. China

car vs. automobile

AmbiguityMay retrieve irrelevant document that include ambiguous terms (due topolysemy)

‘Apple’ (company vs. fruit)

‘Java’ (programming language vs. Island)

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 14 / 23

Page 51: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

Problems with Keywords Search

Term mismatchMay not retrieve relevant documents that include synonymous terms

PRC vs. China

car vs. automobile

AmbiguityMay retrieve irrelevant document that include ambiguous terms (due topolysemy)

‘Apple’ (company vs. fruit)

‘Java’ (programming language vs. Island)

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 14 / 23

Page 52: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

An Intelligent IR system will

Take into account the meaning of the words used.

Adapt to the user based on direct or indirect feedback.

Take into account the importance of the page.

...

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 15 / 23

Page 53: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

An Intelligent IR system will

Take into account the meaning of the words used.

Adapt to the user based on direct or indirect feedback.

Take into account the importance of the page.

...

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 15 / 23

Page 54: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

An Intelligent IR system will

Take into account the meaning of the words used.

Adapt to the user based on direct or indirect feedback.

Take into account the importance of the page.

...

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 15 / 23

Page 55: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

An Intelligent IR system will

Take into account the meaning of the words used.

Adapt to the user based on direct or indirect feedback.

Take into account the importance of the page.

...

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 15 / 23

Page 56: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

An Intelligent IR system will

Take into account the meaning of the words used.

Adapt to the user based on direct or indirect feedback.

Take into account the importance of the page.

...

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 15 / 23

Page 57: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

Active Areas of Research

Compiled based on the most recent papers at SIGIR, just indicative, notexhaustive

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 16 / 23

Page 58: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

What to retrieve

Leveraging User Reviews to Improve Accuracy for Mobile App Retrieval.Dae Hoon Park, Mengwen Liu, ChengXiang Zhai, Haohong Wang

Retrieval of Relevant Opinion Sentences for New Products. Dae HoonPark, Hyun Duk Kim, ChengXiang Zhai, Lifan Guo

Temporal Feedback for Tweet Search with Non-Parametric DensityEstimation. Miles Efron, Jimmy Lin, Jiyin He, Arjen P. de Vries

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 17 / 23

Page 59: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

What to retrieve

Leveraging User Reviews to Improve Accuracy for Mobile App Retrieval.Dae Hoon Park, Mengwen Liu, ChengXiang Zhai, Haohong Wang

Retrieval of Relevant Opinion Sentences for New Products. Dae HoonPark, Hyun Duk Kim, ChengXiang Zhai, Lifan Guo

Temporal Feedback for Tweet Search with Non-Parametric DensityEstimation. Miles Efron, Jimmy Lin, Jiyin He, Arjen P. de Vries

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 17 / 23

Page 60: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

What to retrieve

Leveraging User Reviews to Improve Accuracy for Mobile App Retrieval.Dae Hoon Park, Mengwen Liu, ChengXiang Zhai, Haohong Wang

Retrieval of Relevant Opinion Sentences for New Products. Dae HoonPark, Hyun Duk Kim, ChengXiang Zhai, Lifan Guo

Temporal Feedback for Tweet Search with Non-Parametric DensityEstimation. Miles Efron, Jimmy Lin, Jiyin He, Arjen P. de Vries

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 17 / 23

Page 61: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

What to retrieve

Leveraging User Reviews to Improve Accuracy for Mobile App Retrieval.Dae Hoon Park, Mengwen Liu, ChengXiang Zhai, Haohong Wang

Retrieval of Relevant Opinion Sentences for New Products. Dae HoonPark, Hyun Duk Kim, ChengXiang Zhai, Lifan Guo

Temporal Feedback for Tweet Search with Non-Parametric DensityEstimation. Miles Efron, Jimmy Lin, Jiyin He, Arjen P. de Vries

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 17 / 23

Page 62: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

Query Completion

Analyzing User’s Sequential Behavior in Query Auto-Completion viaMarkov Processes. Liangda Li, Hongbo Deng, Anlei Dong, Yi Chang,Hongyuan Zha, Ricardo Baeza-Yates

adaQAC: Adaptive Query Auto-Completion via Implicit NegativeFeedback. Aston Zhang, Amit Goyal, Weize Kong, Hongbo Deng, AnleiDong, Yi Chang, Carl A. Gunter, Jiawei Han

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 18 / 23

Page 63: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

Search experience contd ...

Different users, Different Opinions: Predicting Search Satisfaction with MouseMovement Information. Yiqun Liu, Ye Chen, Jinhui Tang, Jiashen Sun, Min Zhang,Shaoping Ma, Xuan Zhu

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 19 / 23

Page 64: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

Search experience contd ...

An Eye-Tracking Study of Query Reformulation. Carsten Eickhoff, Sebastian Dungs,Vu Tran

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 20 / 23

Page 65: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

Search Experience

How many results per page? A Study of SERP Size, Search Behavior and UserExperience. Diane Kelly, Leif Azzopardi

Influence of Vertical Result in Web Search Examination. Liu Zeyang, Yiqun Liu,Ke Zhou, Min Zhang, Shaoping Ma

Unconscious Physiological Effects of Search Latency on Users and Their ClickBehaviour. Miguel Barreda-Angeles, Ioannis Arapakis, Xiao Bai, B. BarlaCambazoglu, Alexandre Pereda-Banos

Context-Aware Web Search Abandonment Prediction. Yang Song, Xiaolin Shi,Ryen W. White, Ahmed Hassan

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 21 / 23

Page 66: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

What do we cover in this course

IR Basics - PG

Boolean retrieval

The term vocabulary & postings lists

Dictionaries and tolerant retrieval

Index construction

Index compression

Scoring, term weighting & the vector space model

Computing scores in a complete search system

Evaluation in information retrieval

Relevance feedback & query expansion

Probabilistic information retrieval

Language models for information retrieval

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 22 / 23

Page 67: January 4th, 2016 - IIT Kharagpurpawang/courses/IR16/lec1.pdf · 2017. 1. 5. · Email: pawang@cse.iitkgp.ernet.in Office: CSE - 308 Webpage: pawang/ Teaching Assistants Amrith Krishna

Course Contents

Classification, clustering and Web - AM

Text classification & Naive Bayes

Vector space classification

Flat clustering

Hierarchical clustering

Matrix decompositions & latent semantic indexing

Web crawling and indexes

Link analysis

Pawan Goyal (IIT Kharagpur) Information Retrieval: Course Introduction January 4th, 2016 23 / 23