Semi Supervised Recognition of Sarcastic Sentences in Twitter and Amazon Dmitry DavidovOren TsurAri...

15
Semi Supervised Recognition of Sarcastic Sentences in Twitter and Amazon Dmitry Davidov Oren Tsur Ari Rappoport

Transcript of Semi Supervised Recognition of Sarcastic Sentences in Twitter and Amazon Dmitry DavidovOren TsurAri...

Page 1: Semi Supervised Recognition of Sarcastic Sentences in Twitter and Amazon Dmitry DavidovOren TsurAri Rappoport.

Semi Supervised Recognition of Sarcastic Sentences in Twitter and Amazon

Dmitry Davidov Oren Tsur Ari Rappoport

Page 2: Semi Supervised Recognition of Sarcastic Sentences in Twitter and Amazon Dmitry DavidovOren TsurAri Rappoport.

Sarcasm: Definition

• “Sarcasm is a sophisticated form of speech act in which the speakers convey their message in an implicit way.”

• “The activity of saying or writing the opposite of what you mean, or of speaking in a way intended to make someone else feel stupid or angry.” – Macmillan English Dictionary(2007)

Page 3: Semi Supervised Recognition of Sarcastic Sentences in Twitter and Amazon Dmitry DavidovOren TsurAri Rappoport.

Examples

• Twitter:“This is what I get to study tonight…! Yippy #sarcasm”“Ahhhh the feeling you get while driving back to

boarding school. The best. #sarcasm”

• Amazon:“Finally pens for women! I don’t know what I have

been doing all my life writing with men’s pens.”“Defective by Design.”

Page 4: Semi Supervised Recognition of Sarcastic Sentences in Twitter and Amazon Dmitry DavidovOren TsurAri Rappoport.

SASI – Semi Supervised Sarcasm Identification

• Trains a classifier to recognize sarcastic patterns in a semi-supervised setting.

• Classifies sentences into sarcastic classes using the classifier: Absence of Sarcasm (1) to Clearly Sarcastic (5).

Page 5: Semi Supervised Recognition of Sarcastic Sentences in Twitter and Amazon Dmitry DavidovOren TsurAri Rappoport.

Seed data for Training(Amazon)

• 80 positive and 505 negative examples extended to 471 positive and 5020 negative examples. (Using Yahoo! BOSS API)

• Data was preprocessed to replace occurrences of author, product, company, book titles, usernames, links with [AUTHOR], [PRODUCT], [COMPANY], [TITLE], [USER], [LINK]

• Reduces specificity of patterns recognized.

Page 6: Semi Supervised Recognition of Sarcastic Sentences in Twitter and Amazon Dmitry DavidovOren TsurAri Rappoport.

Seed data for Training(Twitter)

• Positive examples same as the ones used for Amazon and negative examples were hand annotated. (cross domain)

• Data was preprocessed to replace occurrences of username, links and hash-tags with [USER], [LINK] , [HASHTAG]

Page 7: Semi Supervised Recognition of Sarcastic Sentences in Twitter and Amazon Dmitry DavidovOren TsurAri Rappoport.

Testing data

• 66000 Amazon product reviews for 120 products

• 5.9 million tweets

Page 8: Semi Supervised Recognition of Sarcastic Sentences in Twitter and Amazon Dmitry DavidovOren TsurAri Rappoport.

Pattern extraction

• Words were classified as High frequency(HFW) or Content(CW) based on frequency comparison.

• HFW have a frequency of at least 100 per million and CW have a frequency of at most 1000 per million.

• Patterns such as “[COMPANY] CW does not CW much” and “about CW CW or CW CW” are extracted.

Page 9: Semi Supervised Recognition of Sarcastic Sentences in Twitter and Amazon Dmitry DavidovOren TsurAri Rappoport.

Pattern extraction(contd.)

• To reduce the number of patterns:– Remove patterns which occur in only one review– Remove ambivalent patterns.

• Patterns such as “[COMPANY] CW does not CW much” and “about CW CW or CW CW” are extracted.

Page 10: Semi Supervised Recognition of Sarcastic Sentences in Twitter and Amazon Dmitry DavidovOren TsurAri Rappoport.

Feature Vectors

• Each pattern is used as one element of feature vector

• F = [p1, p2, p3, …… , pn]• Where pi = 1 – exact match

α – sparse match ƴ * n/N – incomplete match 0 – No match

Page 11: Semi Supervised Recognition of Sarcastic Sentences in Twitter and Amazon Dmitry DavidovOren TsurAri Rappoport.

Classification Algorithm

• Feature vectors for seed data and test data are created and compared.

• For a vector v in the training set,

Label(v) = 1/k Σ Count(Label(ti)) * Label(ti) Σ Count(Label(tj))

where t1..tk are the k seed vectors with lowest euclidean score from v

Page 12: Semi Supervised Recognition of Sarcastic Sentences in Twitter and Amazon Dmitry DavidovOren TsurAri Rappoport.

Baseline and Evaluation

• For the Amazon set, reviews with low star rating and high positive word content.

• For Twitter set, 1500 tweets with #sarcasm served as a gold standard. (Noisy)

• Five fold validation performed.• A random sampling of 90 positively and 90

negatively ranked sentences from the test data were annotated with the help of Mechanical Turk. (k = 0.34(Am), k = 0.41(Tw))

Page 13: Semi Supervised Recognition of Sarcastic Sentences in Twitter and Amazon Dmitry DavidovOren TsurAri Rappoport.

Five Fold Evaluation(Amazon)

Page 14: Semi Supervised Recognition of Sarcastic Sentences in Twitter and Amazon Dmitry DavidovOren TsurAri Rappoport.

Five Fold Evaluation(Twitter)

Page 15: Semi Supervised Recognition of Sarcastic Sentences in Twitter and Amazon Dmitry DavidovOren TsurAri Rappoport.

Final evaluation results