Intelligent Email: Reply and Attachment Prediction Mark Dredze, Tova Brooks, Josh Carroll Joshua...

13
Intelligent Email: Reply and Attachment Prediction Mark Dredze, Tova Brooks, Josh Carroll Joshua Magarick, John Blitzer, Fernando Pereira Presented by Nareg Torosian

description

Intelligent? How?  Prediction tasks treated as binary classification problems Binary vector, where each dimension represents a feature  Learning performed with logistic regression  System evaluated using F 1, harmonic mean of precision and recall  Single-user (adaptive) and cross-user (adaptable) settings

Transcript of Intelligent Email: Reply and Attachment Prediction Mark Dredze, Tova Brooks, Josh Carroll Joshua...

Page 1: Intelligent Email: Reply and Attachment Prediction Mark Dredze, Tova Brooks, Josh Carroll Joshua Magarick, John Blitzer, Fernando Pereira Presented by.

Intelligent Email: Reply and Attachment PredictionMark Dredze, Tova Brooks, Josh CarrollJoshua Magarick, John Blitzer, Fernando Pereira

Presented by Nareg Torosian

Page 2: Intelligent Email: Reply and Attachment Prediction Mark Dredze, Tova Brooks, Josh Carroll Joshua Magarick, John Blitzer, Fernando Pereira Presented by.

What’s the use?

Whittaker & Sidner’s “email overload” Task management Personal archiving Asynchronous communication

Assist overwhelmed email users Support enhanced email interface

Page 3: Intelligent Email: Reply and Attachment Prediction Mark Dredze, Tova Brooks, Josh Carroll Joshua Magarick, John Blitzer, Fernando Pereira Presented by.

Intelligent? How?

Prediction tasks treated as binary classification problems Binary vector , where each

dimension represents a feature Learning performed with logistic regression System evaluated using F1, harmonic mean

of precision and recall Single-user (adaptive) and cross-user

(adaptable) settings

Page 4: Intelligent Email: Reply and Attachment Prediction Mark Dredze, Tova Brooks, Josh Carroll Joshua Magarick, John Blitzer, Fernando Pereira Presented by.

Reply prediction

Indicate which messages require reply Allow user to manage these messages

Page 5: Intelligent Email: Reply and Attachment Prediction Mark Dredze, Tova Brooks, Josh Carroll Joshua Magarick, John Blitzer, Fernando Pereira Presented by.

Reply prediction features

Relational features Based on user profile

# of sent and received messages, address book, email address and domain

I appear in the CC list, I frequently reply to this user, etc.

200 in Dredze et al.’s experiment Document features

Presence of question marks and question words TF-IDF (term frequency – inverse document

frequency) scores Presence of attachments 14,800 in Dredze et al.’s experiment

Page 6: Intelligent Email: Reply and Attachment Prediction Mark Dredze, Tova Brooks, Josh Carroll Joshua Magarick, John Blitzer, Fernando Pereira Presented by.

The grand experiment

Evaluated on 4 user mailboxes Users manually tagged messages as

either needs reply or does not need reply “It is not surprising that overwhelmed users

acknowledge that a message did require their reply even though they failed to do so; classifiers trained on actual user reply behavior are thus very poor.”

2,391 total emails, excluding spam 80/20 train/test split

Page 7: Intelligent Email: Reply and Attachment Prediction Mark Dredze, Tova Brooks, Josh Carroll Joshua Magarick, John Blitzer, Fernando Pereira Presented by.

The single-user results

Page 8: Intelligent Email: Reply and Attachment Prediction Mark Dredze, Tova Brooks, Josh Carroll Joshua Magarick, John Blitzer, Fernando Pereira Presented by.

The cross-user results

Only relational features were effective, so others omitted

Page 9: Intelligent Email: Reply and Attachment Prediction Mark Dredze, Tova Brooks, Josh Carroll Joshua Magarick, John Blitzer, Fernando Pereira Presented by.

Attachment prediction

“See attachment…hey, wait a minute…” Possible UI considerations

Document sidebar Alert user before sending

Indicate which messages need attachments

Page 10: Intelligent Email: Reply and Attachment Prediction Mark Dredze, Tova Brooks, Josh Carroll Joshua Magarick, John Blitzer, Fernando Pereira Presented by.

Attachment prediction features

Relational features Based on user profile

# of sent and received messages, # of attachments, email address and domain

Conjunctions between volume of messages/attachments and TO/CC fields

72 in Dredze et al.’s experiment Document features

Presence and placement of “attach” Presence of attachments 39,308 in Dredze et al.’s experiment

Page 11: Intelligent Email: Reply and Attachment Prediction Mark Dredze, Tova Brooks, Josh Carroll Joshua Magarick, John Blitzer, Fernando Pereira Presented by.

The grander experiment

Evaluated on publicly available Enron email corpus 150 users and 250,000 emails Lots of cleanup needed

Users manually tagged messages as needs attachment Only popular document formats Forwarded messages excluded

Subset of 15,000 messages from 144 users 1,020 with attachments

10-fold cross validation

Page 12: Intelligent Email: Reply and Attachment Prediction Mark Dredze, Tova Brooks, Josh Carroll Joshua Magarick, John Blitzer, Fernando Pereira Presented by.

The results

Page 13: Intelligent Email: Reply and Attachment Prediction Mark Dredze, Tova Brooks, Josh Carroll Joshua Magarick, John Blitzer, Fernando Pereira Presented by.

GUEPs and CDs GUEPs

Mental model Improvement Consistency

CDs Premature commitment Hidden dependencies Abstraction Consistency Provisionality