Clustering and Exploring Search Results using Timeline Constructions
description
Transcript of Clustering and Exploring Search Results using Timeline Constructions
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Clustering and Exploring Search Results usingTimeline Constructions
Presenter: Tsai Tzung Ruei Authors: Omar Alonso, Michael Gertz, Ricardo Baeza-Yates
CIKM 2009
國立雲林科技大學National Yunlin University of Science and Technology
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Outline
Motivation Objective Time annotated document model Methodology Experiments Conclusion Comments
2
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Motivation
Any of the current search engines does not exploit the temporal information embedded in the documents.
3
1. Do you think current timelines for organizing or clustering search results (such as in Google’s timeline) are useful for some of your daily search activities?
2. Do you use (or would use) timelines to explore search results?
3. Please indicate some search scenarios where you use timelines or would like to use timelines to organize search results.
4. Please give some examples of search scenarios where current search engines do not sufficiently support the concept of timelines to organize and explore search results?
5. What other features would you like to see in the context of timelines?
時間軸
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Objective
To present an add-on to traditional information retrieval applications in which we exploit various temporal information associated with documents to present and cluster documents along timelines.
4
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
TIME ANNOTATED DOCUMENT MODEL Time and Timelines Temporal Expressions Temporal Document Profiles
5
Our base timeline, denoted Td, is an interval of consecutive day chronons.EX: “March 12, 2002; March 13, 2002;March 14, 2002”implicit temporal expressionEX:“Valentine's Day 2006”Explicit temporal expressionsEX:December 2004Relative temporal expressionsEX:“today”
ExplicitimplicittimestampsRelative
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Methodology
PROTOTYPE Process Overview
6
Corpora
Alembic (POS tagger)
GUTime temporal tagger
Oracle
XML Document
(tdp)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Methodology
TCluster Constructing a Time Outline for the documents in the hit list Lq. Document Clustering Ranking Documents in a Cluster
7
a hit list Lq =[d1, d2, . . . , dk] of k documents
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Experiments
DMOZ Introduction :a multilingual open content directory
8
2010, 2006, 2002, 1998 and 1994
document clusters
World Cupdocumentspre-defined categories(5)< TCluster (21)
Each World Cup document has a single event as the main theme.
Resultdocuments are well classified by users in terms of the actual event.
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Experiments
The TimeBank 1.2 corpus It contains news articles that have been annotated using TimeML with
temporal expressions related to events, times and temporal links between events and times.
9
ResultA 50% increase in the number of clusters discovered by TCluster
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Experiments
Relevance Evaluation using AMT It is a crowdsourcing platform
10
ResultThe average response was 4.04(with an 80% agreement level)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Conclusion
MAJOR CINTRIBUTION TCluster algorithm provides great flexibility and allows users to explore
clusters of search result documents that are organized along well-defined timelines, supporting different levels of time granularity.
The utility of the time-based clustering over existing approaches that cluster documents only based on document timestamps.
FUTURE WORK To want to study the weighting of relative temporal expressions as well
as different sentence distance functions for determining the rank of documents in a cluster.
11
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Comment
Advantage Provides a new method of time searching
Drawback Some mistakes
Application information retrieval Clustering
12