Clustering and Exploring Search Results using Timeline Constructions

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Clustering and Exploring Search Results usingTimeline Constructions

Presenter: Tsai Tzung Ruei　Authors: Omar Alonso, Michael Gertz, Ricardo Baeza-Yates

CIKM 2009

國立雲林科技大學National Yunlin University of Science and Technology


N.Y.U.S.T.

I. M.Outline

Motivation Objective Time annotated document model Methodology Experiments Conclusion Comments

2


N.Y.U.S.T.

I. M.Motivation

Any of the current search engines does not exploit the temporal information embedded in the documents.

3

1. Do you think current timelines for organizing or clustering search results (such as in Google’s timeline) are useful for some of your daily search activities?

2. Do you use (or would use) timelines to explore search results?

3. Please indicate some search scenarios where you use timelines or would like to use timelines to organize search results.

4. Please give some examples of search scenarios where current search engines do not sufficiently support the concept of timelines to organize and explore search results?

5. What other features would you like to see in the context of timelines?

時間軸


N.Y.U.S.T.

I. M.Objective

To present an add-on to traditional information retrieval applications in which we exploit various temporal information associated with documents to present and cluster documents along timelines.

4


N.Y.U.S.T.

I. M.

TIME ANNOTATED DOCUMENT MODEL Time and Timelines Temporal Expressions Temporal Document Profiles

5

Our base timeline, denoted Td, is an interval of consecutive day chronons.EX: “March 12, 2002; March 13, 2002;March 14, 2002”implicit temporal expressionEX:“Valentine's Day 2006”Explicit temporal expressionsEX:December 2004Relative temporal expressionsEX:“today”

ExplicitimplicittimestampsRelative


N.Y.U.S.T.

I. M.Methodology

PROTOTYPE Process Overview

6

Corpora

Alembic (POS tagger)

GUTime temporal tagger

Oracle

XML Document

(tdp)


N.Y.U.S.T.

I. M.Methodology

TCluster Constructing a Time Outline for the documents in the hit list Lq. Document Clustering Ranking Documents in a Cluster

7

a hit list Lq =[d1, d2, . . . , dk] of k documents


N.Y.U.S.T.

I. M.Experiments

DMOZ Introduction :a multilingual open content directory

8

2010, 2006, 2002, 1998 and 1994

document clusters

World Cupdocumentspre-defined categories(5)< TCluster (21)

Each World Cup document has a single event as the main theme.

Resultdocuments are well classified by users in terms of the actual event.


N.Y.U.S.T.

I. M.Experiments

The TimeBank 1.2 corpus It contains news articles that have been annotated using TimeML with

temporal expressions related to events, times and temporal links between events and times.

9

ResultA 50% increase in the number of clusters discovered by TCluster


N.Y.U.S.T.

I. M.Experiments

Relevance Evaluation using AMT It is a crowdsourcing platform

10

ResultThe average response was 4.04(with an 80% agreement level)


N.Y.U.S.T.

I. M.Conclusion

MAJOR CINTRIBUTION TCluster algorithm provides great flexibility and allows users to explore

clusters of search result documents that are organized along well-defined timelines, supporting different levels of time granularity.

The utility of the time-based clustering over existing approaches that cluster documents only based on document timestamps.

FUTURE WORK To want to study the weighting of relative temporal expressions as well

as different sentence distance functions for determining the rank of documents in a cluster.

11


N.Y.U.S.T.

I. M.Comment

Advantage Provides a new method of time searching

Drawback Some mistakes

Application information retrieval Clustering

12

Clustering and Exploring Search Results using Timeline Constructions

Documents

Transcript of Clustering and Exploring Search Results using Timeline Constructions