Presented by Sotiris Gkountelitsas
description
Transcript of Presented by Sotiris Gkountelitsas
![Page 1: Presented by Sotiris Gkountelitsas](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816192550346895dd12f7c/html5/thumbnails/1.jpg)
Personalizing Search via Automated Analysis of Interests and Activities
Jaime Teevan Susan T.Dumains Eric Horvitz MIT,CSAIL Microsoft Researcher Microsoft Researcher
One Microsoft Way One Microsoft Way
Presented by Sotiris Gkountelitsas
![Page 2: Presented by Sotiris Gkountelitsas](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816192550346895dd12f7c/html5/thumbnails/2.jpg)
Motivation Example
![Page 3: Presented by Sotiris Gkountelitsas](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816192550346895dd12f7c/html5/thumbnails/3.jpg)
Initial Results
![Page 4: Presented by Sotiris Gkountelitsas](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816192550346895dd12f7c/html5/thumbnails/4.jpg)
Relevance Feedback
![Page 5: Presented by Sotiris Gkountelitsas](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816192550346895dd12f7c/html5/thumbnails/5.jpg)
Revised Results
![Page 6: Presented by Sotiris Gkountelitsas](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816192550346895dd12f7c/html5/thumbnails/6.jpg)
Defining the Problem• Search engines satisfy information intents but do
not discern people.
• Detailed specification of information goals.▫ People are lazy.▫ People are not good at specifying detailed
information goals.
![Page 7: Presented by Sotiris Gkountelitsas](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816192550346895dd12f7c/html5/thumbnails/7.jpg)
Solution• Use of implicit information about the user to
create profile and rerank results locally.▫ Previously issued queries.▫ Previously visited Web pages.▫ Documents or emails the user has read, created or
sent.BM25Profile
![Page 8: Presented by Sotiris Gkountelitsas](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816192550346895dd12f7c/html5/thumbnails/8.jpg)
BM25• The method ranks documents by summing over
terms of interest the product of the term weight (wi) and the frequency with which that term appears in the document (tfij).
• Result = wi tfij • No Relevance information available:
• Relevance information available:
![Page 9: Presented by Sotiris Gkountelitsas](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816192550346895dd12f7c/html5/thumbnails/9.jpg)
Traditional vs Personal Profile FB
• N’ = Ν + R• ni ’ = ni + ri
![Page 10: Presented by Sotiris Gkountelitsas](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816192550346895dd12f7c/html5/thumbnails/10.jpg)
Corpus Representation (N,ni)• We need information about two parameters:
▫ ni number of documents in the web that contain term i. Issue one word queries
▫ N number of documents in the Web Issue query with the word the.
• The Corpus can be query focused or not.
• Practical Issues:▫ Cannot always issue a query for every term (inefficient).
Approximate corpus using statistics from the title and snippet of every document (efficient).
![Page 11: Presented by Sotiris Gkountelitsas](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816192550346895dd12f7c/html5/thumbnails/11.jpg)
User representation (R, ri)• For the representation of the user a rich index of
personal content was used that captured the user’s interests and computational activities.
• Index included:▫ Web pages.▫ Email messages.▫ Calendar items.▫ Documents stored on the client machine.
• The user representation can be query focused or not.• Time sensitivity
▫ Documents indexed in the last month vs the full index of documents
• User Interests:▫ Query terms issued in the past.
![Page 12: Presented by Sotiris Gkountelitsas](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816192550346895dd12f7c/html5/thumbnails/12.jpg)
Document and Query Representation• Document representation is important in
determining both what terms (i) are included and how often they occur (tfi).
• Terms can be obtained from:▫ Full text▫ Snippets▫ Words at different distances from the query terms
![Page 13: Presented by Sotiris Gkountelitsas](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816192550346895dd12f7c/html5/thumbnails/13.jpg)
Evaluation Framework• 15 computer literate participants.• Evaluation for the top 50 search results.• 3 possible evaluations:
▫ Highly relevant▫ Relevant▫ Not relevant
• Queries:▫ Personal formulated queries.▫ Queries of general interest (Busch, cancer, Web
search).• Discounted Cumulative Gain (DCG):
![Page 14: Presented by Sotiris Gkountelitsas](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816192550346895dd12f7c/html5/thumbnails/14.jpg)
Results (1/2) Best Combination
![Page 15: Presented by Sotiris Gkountelitsas](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816192550346895dd12f7c/html5/thumbnails/15.jpg)
Results (2/2)PS Algorithm
Combined Result
![Page 16: Presented by Sotiris Gkountelitsas](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816192550346895dd12f7c/html5/thumbnails/16.jpg)
Conclusion•Automatically constructed user profile to
be used as Relevance Feedback is feasible.
•Performs better than explicit Relevance Feedback.
•Combined with Web Ranking improves even more the performance.
![Page 17: Presented by Sotiris Gkountelitsas](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816192550346895dd12f7c/html5/thumbnails/17.jpg)
Further Exploration•Better tuning of the profile parameters:
▫Time.▫Automate best parameter combination
selection.▫Additional classes of text and non text
based content.▫Location.
![Page 18: Presented by Sotiris Gkountelitsas](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816192550346895dd12f7c/html5/thumbnails/18.jpg)
•Thank you for your attention!!!
•Questions