BuzzTrack Topic Detection and Tracking in Email

30
BuzzTrack Topic Detection and Tracking in Email IUI – Intelligent User Interfaces January 2007 Keno Albrecht ETH Zurich [email protected] h Roger Wattenhofer ETH Zurich [email protected] thz.ch Gabor Cselle Google [email protected]

description

BuzzTrack Topic Detection and Tracking in Email. IUI – Intelligent User Interfaces January 2007. Gabor Cselle Google [email protected]. Keno Albrecht ETH Zurich [email protected]. Roger Wattenhofer ETH Zurich [email protected]. Email Overload. - PowerPoint PPT Presentation

Transcript of BuzzTrack Topic Detection and Tracking in Email

Page 1: BuzzTrack Topic Detection and Tracking in Email

BuzzTrackTopic Detection and Tracking in Email

IUI – Intelligent User InterfacesJanuary 2007

Keno AlbrechtETH Zurich

[email protected]

Roger WattenhoferETH Zurich

[email protected]

Gabor CselleGoogle

[email protected]

Page 2: BuzzTrack Topic Detection and Tracking in Email

2

Email Overload• Email clients were not designed to

handle volume and variety of messages users are dealing with today:

• Large volumes of email• Task Management• Personal Archiving or Filing• Keeping Context

[Whittaker and Sidner, 1996]

Page 3: BuzzTrack Topic Detection and Tracking in Email

3

Search vs. Inbox Browsing• Fast full-text search

is today's solution to finding past emails.

• But the flat inbox view of newly incoming emails hasn’t changed.

In our work, we focus on the problem of sensibly structuring emails in the inbox.

Page 4: BuzzTrack Topic Detection and Tracking in Email

4

Today's Email Clients: The Three-Pane View

No sense of context: unrelated messages are shown together

Important emails may drop off the “first screen”

“Thread-based” tree views are unsophisticated, may not pull in all relevant messages.

Page 5: BuzzTrack Topic Detection and Tracking in Email

5

BuzzTrackEmail client extension for Mozilla Thunderbirdfor displaying email grouped by topic.

Page 6: BuzzTrack Topic Detection and Tracking in Email

6

Related Work

Page 7: BuzzTrack Topic Detection and Tracking in Email

7

Visualizations: ConversationsGmail (Google)

common conversation title

one entry per email, folds out on click

Page 8: BuzzTrack Topic Detection and Tracking in Email

8

Automatic Foldering• Using machine learning

techniques to automatically move emails into folders upon arrival

• Low accuracy rates [Bekkerman et al, 2005], conceptual problems:• Users need to manually

create folders and seed them with data.

Page 9: BuzzTrack Topic Detection and Tracking in Email

9

People-Centered Email Clients

Bifrost ContactMap

[Bälter and Sidner, 2002] [Whittaker et al., 2004]

Page 10: BuzzTrack Topic Detection and Tracking in Email

10

Task-based Email

Example: TaskMaster

thrasks

thrask contents

item contents

(emails, documents, etc.)

TaskMaster[Belotti et al., 2003]

Page 11: BuzzTrack Topic Detection and Tracking in Email

11

BuzzTrack

Page 12: BuzzTrack Topic Detection and Tracking in Email

12

BuzzTrack• Mozilla Thunderbird

extension to automatically group related emails into topics.

• Will be distributed through website: www.buzztrack.net

• Provides a view on the user’s inbox.

Page 13: BuzzTrack Topic Detection and Tracking in Email

13

What’s a Topic?

• Topics are groups of emails that relate to the same idea, action, event, task, or question.

• Examples:•A conversation about buying a

digital camera.•Referring a candidate for a job.•All emails belonging to same

newsgroup.

Page 14: BuzzTrack Topic Detection and Tracking in Email

14

Clustering Process• For every new incoming email:

Preprocessing Clustering

Label generation

Cluster storeBuzzTrack View in

Thunderbird

Page 15: BuzzTrack Topic Detection and Tracking in Email

15

Preprocessing• Tokenization (remove HTML tags, style

sheets, punctuation, and numbers)• Language detection• Stemming• For topic labelling:

• Identify Parts-of-speech• Remember popular original word

forms

Page 16: BuzzTrack Topic Detection and Tracking in Email

16

Clustering• Single-link clustering: Newly incoming emails are

compared to every email in existing topics:• Similarity value > threshold: assigned to topic• Similarity value <= threshold: email starts new topic

Topic 1 Topic 2

Topic 3

new email

Page 17: BuzzTrack Topic Detection and Tracking in Email

17

Features - 1• How do we generate similarity values

between emails?• Via a linear combination of several

similarity features. • Examples:

• Text similarity (TFIDF Value, cosine similarity metric)

• People similarities (comparing sets of people in the From / To / Cc lines of email headers)

• Thread membership

Page 18: BuzzTrack Topic Detection and Tracking in Email

18

Features - 2Other features for deriving similarities:• Subject similarity• Sender domain overlaps• Sender rank and percentage• % of email from sender that is

answered• Time passed since last email in topic• People and reference count for email• Known people and reference %• Cluster size• Has attachment

Page 19: BuzzTrack Topic Detection and Tracking in Email

19

Decision Score

Similarities are combined into a decision score for each email / cluster pair through a linear combination of feature values:deci,j = wa*sima(mi,Cj) + wb*simb(mi,Cj) + …

We tested two sets of weights wx, both trained on a development set of emails:

• Empirical• Linear SVM

Page 20: BuzzTrack Topic Detection and Tracking in Email

20

Evaluation• How do we evaluate clustering quality?• Topic Detection and Tracking

competitions by NIST. Aimed at clustering news articles.

• Corpus:

Page 21: BuzzTrack Topic Detection and Tracking in Email

21

Clustering Tasks• Clustering Task is split into subtasks:

• New Topic Detection (NTD):Given stream of emails, which ones start new topics?

• Topic Tracking (TT):Given a fixed topic, which newly incoming emails belong to it?

• DET Curves plot miss rate vs. false alarm rate for possible threshold for decision scores

Page 22: BuzzTrack Topic Detection and Tracking in Email

22

Results NTD• TDT New Topic Detection Task

Miss: 3%False alarm: 30%

bett

er

better

Page 23: BuzzTrack Topic Detection and Tracking in Email

23

Results TT• TDT Topic Tracking Task

Miss: 8%False alarm: 2%

bett

er

better

Page 24: BuzzTrack Topic Detection and Tracking in Email

24

Comparison• Comparable quality to TDT for news

articles [NIST 2004]• News has less metadata, email has

worse text quality.• Wide body of work exists on improving

clustering performance on news, we haven’t tapped into that yet.

Page 25: BuzzTrack Topic Detection and Tracking in Email

25

BuzzTrack View

• Mozilla Thunderbird plugin that provides useful view on inbox data “for free”

• Topics contain email from last 60 days• We’re interested in current email

only• Reduces initial clustering time

• Each email is shown in one topic

Page 26: BuzzTrack Topic Detection and Tracking in Email

26

Page 27: BuzzTrack Topic Detection and Tracking in Email

27

Demo 1: BuzzTrack

Page 28: BuzzTrack Topic Detection and Tracking in Email

28

BuzzTrack PanesTopic pane: • Provides additional

info• Starred topics

Email pane:• Topics sorted by last

incoming email

Page 29: BuzzTrack Topic Detection and Tracking in Email

29

Future Work• Distribute plugin to Thunderbird users

• Input on possible UI improvements• Input on clustering quality

• Different clustering styles• People-based• Thread-based

• We hope BuzzTrack will be valuable tool for real-world users

Page 30: BuzzTrack Topic Detection and Tracking in Email

30

Questions?

Contact: Gabor Cselle, [email protected]

Website:www.buzztrack.net