Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods...
Transcript of Sentiment Analysis + extras - courses.cs.ut.ee · •Using NLP/statistics/machine learning methods...
Sentiment Analysis + extrasNatural Language Processing: Lecture 10
09.11.2017
Kairit Sirts
Homework 2 – POS taggingScore Name Features
97.3 Faiz word +/-2 words; prefixes, suffixes; numeric, cap, all caps, first and last word, hyphen, lower
96.6 Maksym word and stem +/-1 stems; prefixes, suffixes, last 2 words; cap, P and N punct; word pos, sent len
96.5 Olha word/stem -2 words/stems, +1 word/stem; prefixes, suffixes
95.4 Ian word and stem -2 words/stems; previous POS tag
94.5 Andre stem +1 word -2 words; prefixes, suffixes; cap, punct
94.1 Alina word -1 word, stem -2 stems; prefixes, suffixes; contains num, punct; cap, is punct, is num
93.9 Liza word; prefixes, suffixes; num, first, last, 2nd last, alpha, cap, all caps, punct, article (a, an, the)
93.4 Viacheslav stem; prefixes, suffixes; characters (by frequency)
93.2 Aytaj word -2 words; prefixes, suffixes; cap, all caps, lower, contains .
92.9 Vladyslav word/stem -2 words/stems; prefixes, suffixes; cap, punct; word length
89.5 Artem stem
89.2 Yevhen, Ivan, Hendrik stem
79.4 Yurii word, stem; bigram of word and previous word 2
Topics
• Sentiment analysis
• Sentiment lexicons
• Extras• Topic modeling• More about evaluation
• Cross-validation• Precision and recall in multiclass setting
• Slides partially based on:• https://web.stanford.edu/~jurafsky/slp3/slides/7_NB.pdf• https://lct-master.org/files/MullenSentimentCourseSlides.pdf• https://web.stanford.edu/~jurafsky/slp3/slides/21_SentLex.pdf
3
Sentiment Analysis
4
Positive or negative movie review
• unbelievably disappointing
• Full of zany characters and richly applied satire, and some great plot twists
• this is the greatest screwball comedy ever filmed
• It was pathetic. The worst part about it was the boxing scenes.
5
What is sentiment?
• Sentiment = feelings• Attitudes
• Emotions
• Opinions
• Subjective impressions, not facts
6
What is sentiment?
• Generally, a binary opposition in opinions is assumed
• For/against, like/dislike, good/bad etc
• Some sentiment analysis jargon:• “Semantic orientation”
• “Polarity”
7
What is sentiment analysis
• Using NLP/statistics/machine learning methods to extract, identify or otherwise characterize the sentiment content of a text unit
• Also referred to as:• Opinion extraction
• Opinion mining
• Sentiment mining
• Subjectivity analysis
8
Why sentiment analysis?
• Movie: is this review positive or negative?
• Products: what do people think about the new iPhone?
• Public sentiment: how is consumer confidence? Is despair increasing?
• Politics: what do people think about this candidate or issue?
• Prediction: predict election outcomes or market trends from sentiment
• Customer service: is this customer email satisfied or dissatisfied?
• Marketing: how are people responding to this ad/campaign/product release/news item?
9
Scherer Typology of Affective States
• Emotion: brief organically synchronized … evaluation of a major event
• angry, sad, joyful, fearful, ashamed, proud, elated
• Mood: diffuse non-caused low-intensity long-duration change in subjective feeling
• cheerful, gloomy, irritable, listless, depressed, buoyant
• Interpersonal stances: affective stance toward another person in a specific interaction
• friendly, flirtatious, distant, cold, warm, supportive, contemptuous
• Attitudes: enduring, affectively colored beliefs, dispositions towards objects or persons
• liking, loving, hating, valuing, desiring
• Personality traits: stable personality dispositions and typical behavior tendencies
• nervous, anxious, reckless, morose, hostile, jealous
10
Scherer Typology of Affective States
• Emotion: brief organically synchronized … evaluation of a major event
• angry, sad, joyful, fearful, ashamed, proud, elated
• Mood: diffuse non-caused low-intensity long-duration change in subjective feeling
• cheerful, gloomy, irritable, listless, depressed, buoyant
• Interpersonal stances: affective stance toward another person in a specific interaction
• friendly, flirtatious, distant, cold, warm, supportive, contemptuous
• Attitudes: enduring, affectively colored beliefs, dispositions towards objects or persons
• liking, loving, hating, valuing, desiring
• Personality traits: stable personality dispositions and typical behavior tendencies
• nervous, anxious, reckless, morose, hostile, jealous
11
What is sentiment analysis?
• Sentiment analysis is the detection of attitudes“enduring, affectively colored beliefs, dispositions towards objects or persons”
1. Holder (source) of attitude
2. Target (aspect) of attitude
3. Type of attitude• From a set of types
• Like, love, hate, value, desire, etc.
• Or (more commonly) simple weighted polarity: • positive, negative, neutral, together with strength
4. Text containing the attitude• Sentence or entire document
12
Sentiment analysis
• Simplest task:• Is the attitude of this text positive or negative?
• More complex:• Rank the attitude of this text from 1 to 5
• Advanced:• Detect the target, source, or complex attitude types
13
Other related tasks
• Information extraction (discarding subjective information)
• Question answering (recognising opinion-oriented questions)
• Summarisation (accounting for multiple viewpoints)
• “Flame” detection
• Identifying child-suitability of videos based on comments
• Bias identification in news sources, fake news
• Identifying (in)appropriate content for ad placement
14
Applications in Business Intelligence
• Question: “Why aren’t consumers buying our laptop?”
• We know the concrete data: price, specs, competition etc
• We want to know subjective data: • “the design is tacky”
• “customer service was condescending”
• Misperceptions are also important, e.g. “updated drivers aren’t available” (even though they really are)
15
Challenges in Sentiment Analysis
• People express opinions in complex ways
• In opinion texts, lexical content alone can be misleading
• Negations and topic changes are common, both in the text and sentence level
• Rhetorical devices such as sarcasm, irony, implication etc
16
A letter to a hardware store
“Dear <hardware store>
Yesterday I had occasion to visit <your competitor>. The had an excellent selection, friendly and helpful salespeople, and the lowest prices in town.
You guys suck.
Sincerely,”
17
Amazon.com 5 star
“The characters are so real and handled so carefully, that being trapped inside the Overlook is no longer just a freaky experience. You run along with them, filled with dread, from all the horrible personifications of evil inside the hotel's awful walls. There were several times where I actually dropped the book and was too scared to pick it back up. Intellectually, you know it's not real. It's just a bunch of letters and words grouped together on pages. Still, whenever I go into the bathroom late at night, I have to pull back the shower curtain just to make sure.”
18
Amazon.com 1 star
“The original Star Wars trilogy was a defining part of my childhood. Born as I was in 1971, I was just the right age to fall headlong into this amazing new world Lucas created. I was one of those kids that showed up early at toy stores [...] anxiously awaiting each subsequent installment of the series. I'm so glad that by my late 20s, the old thrill had faded, or else I would have been EXTREMELY upset over Episode I: The Phantom Menace... perhaps the biggest let-down in film history.”
19
Pitchfork.com (0.0 out of 10)
“Ten years on from Exile, Liz has finally managed to achieve what seems to have been her goal ever since the possibility of commercial success first presented itself to her: to release an album that could have just as easily been made by anybody else.”
20
Amazon.com 1 star
“It took a couple of goes to get into it, but once the story hooked me, I found it difficult to put the book down -- except for those moments when I had to stop and shriek at my friends, "SPARKLY VAMPIRES!" or "VAMPIRE BASEBALL!" or "WHY IS BELLA SO STUPID?" These moments came increasingly often as I reached the climactic chapters, until I simply reached the point where I had to stop and flail around laughing.”
21
What to classify
• There are many possibilities of what we might want to classify• Users
• Texts
• Sentences / paragraphs / chunks of texts
• Predetermined descriptive phrases• <ADJ NOUN>, <NOUN NOUN>, <ADV ADJ>, etc
• Words
• Tweets
22
SA as document classification
• Extract features from text• Ngrams
• POS tags, parse structures
• Negation words, emoticons, exclamation marks etc
• Distributed features
• Sentiment lexicons
• Build a classifier• Naïve Bayes
• Logistic regression
• SVM
23
SA as document classification
• Extract features from text• Ngrams
• POS tags, parse structures
• Negation words, emoticons, exclamation marks etc
• Distributed features
• Sentiment lexicons
• Build a classifier• Naïve Bayes
• Logistic regression
• SVM
24
Sentiment Lexicons
25
Polarity keywords
• There seems to be some relation between positive words and positive reviews
• Can we come up with a set of keywords by hand to identify polarity?
• Results from (Pand et al., 2002)
26
Proposed word list Accuracy Ties
Human 1 Pos: dazzling, brilliant, phenomenal, excellent, fantasticNeg: suck, terrible, awful, unwatchable, hideous
58% 75%
Human 2 Pos: gripping, mesmerizing, riveting, spectacular, cool, awesome, thrilling, badass, excellent, moving, excitingNeg: : bad, cliched, sucks, boring, stupid, slow
64% 39%
Human 3 + stats
Pos: love, wonderful, best, great, superb, still, beautifulNeg: bad, worst, stupid, waste, boring, ?, !
69% 16%
Sentiment/Affective Lexicons
• The General Inquirer
• LIWC - Linguistic Inquiry and Word Count
• MPQA Subjectivity Cues Lexicon
• Bing Liu Opinion Lexicon
• SentiWordNet
27
The General InquirerP. J. Stone et al., 1966. The General Inquirer: A Computer Approach to Content Analysis.
• Home page: http://www.wjh.harvard.edu/~inquirer
• List of Categories: http://www.wjh.harvard.edu/~inquirer/homecat.htm
• Spreadsheet: http://www.wjh.harvard.edu/~inquirer/inquirerbasic.xls
• Categories:• Positiv (1915 words) and Negativ (2291 words)
• Strong vs Weak, Active vs Passive, Overstated versus Understated
• Pleasure, Pain, Virtue, Vice, Motivation, Cognitive Orientation, etc
• Free for Research Use
28
LIWC - Linguistic Inquiry and Word CountPennebaker et al., 2007. Linguistic Inquiry and Word Count: LIWC
• Home page: http://www.liwc.net/
• 2300 words, >70 classes
• Affective Processes• negative emotion (bad, weird, hate, problem, tough)
• positive emotion (love, nice, sweet)
• Cognitive Processes• Tentative (maybe, perhaps, guess), Inhibition (block, constraint)
• Pronouns, Negation (no, never), Quantifiers (few, many)
• Academic version $90, 30 day rental 10$
29
MPQA Subjectivity Cues LexiconT. Wilson et al., 2005. Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis. Riloff and Wiebe, 2003. Learning extraction patterns for subjective expressions.
• Home page: http://mpqa.cs.pitt.edu/lexicons/subj_lexicon/
• 6885 words• 2718 positive
• 4912 negative
• Each word annotated for intensity (strong, weak)
• GNU GPL
30
Bing Liu Opinion LexiconM. Hu and B. Liu, 2004. Mining and Summarizing Customer Reviews.
• Bing Liu's Page on Opinion Mining
• http://www.cs.uic.edu/~liub/FBS/opinion-lexicon-English.rar
• 6786 words• 2006 positive
• 4783 negative
31
SentiWordNetS. Baccianella et al., 2010. SENTIWORDNET 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining.
• Home page: http://sentiwordnet.isti.cnr.it/
• All WordNet synsets automatically annotated for degrees of positivity, negativity, and neutrality/objectiveness
• [estimable(J,3)] “may be computed or estimated” • Pos 0 Neg 0 Obj 1
• [estimable(J,1)] “deserving of respect or high regard” • Pos .75 Neg 0 Obj .25
32
Semi-supervised Sentiment Lexicon Extraction
• Use a small amount of information• A few labeled examples
• A few hand-built patterns
• Then bootstrap a lexicon
33
Hatzivassiloglou and McKeown intuition for identifying word polarityV. Hatzivassiloglou and K. R. McKeown, 1997. Predicting the Semantic Orientation of Adjectives.
• Adjectives conjoined by “and” have same polarity• Fair and legitimate, corrupt and brutal
• *fair and brutal, *corrupt and legitimate
• Adjectives conjoined by “but” do not• fair but brutal
34
Hatzivassiloglou and McKeown 1997: Step 1
• Label seed set of 1336 adjectives (all >20 in 21 million word WSJ corpus)
• 657 positive: adequate central clever famous intelligent remarkable reputed sensitive slender thriving…
• 679 negative: contagious drunken ignorant lanky listless primitive strident troublesome unresolved unsuspecting…
35
Hatzivassiloglou and McKeown 1997: Step 2
• Expand seed set to conjoined adjectives
36
nice, helpful
nice, classy
Hatzivassiloglou and McKeown 1997: Step 3
• Supervised classifier assigns “polarity similarity” to each word pair, resulting in graph:
37
Hatzivassiloglou and McKeown 1997: Step 3
• Clustering for partitioning the graph into two
38
Output Polarity Lexicon
• Positive• bold decisive disturbing generous good honest important large mature
patient peaceful positive proud sound stimulating straightforward strange talented vigorous witty…
• Negative• ambiguous cautious cynical evasive harmful hypocritical inefficient insecure
irrational irresponsible minor outspoken pleasant reckless risky selfish tedious unsupported vulnerable wasteful…
39
Output Polarity Lexicon
• Positive• bold decisive disturbing generous good honest important large mature
patient peaceful positive proud sound stimulating straightforward strangetalented vigorous witty…
• Negative• ambiguous cautious cynical evasive harmful hypocritical inefficient insecure
irrational irresponsible minor outspoken pleasant reckless risky selfish tedious unsupported vulnerable wasteful…
40
Turney AlgorithmTurney, 2002. Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews.
1. Extract a phrasal lexicon from reviews
2. Learn polarity of each phrase
3. Rate a review by the average polarity of its phrases
41
Extract two-word phrases with adjectives and adverbs
First word Second word Third word (not extracted)
JJ NN or NNS anything
RB, RBR, RBS JJ Not NN nor NNS
JJ JJ Not NN or NNS
NN or NNS JJ Not NN or NNS
RB, RBR, RBS VB, VBD, VBN, VBG anything
42
How to measure polarity of a phrase?
• Positive phrases co-occur more with “excellent”
• Negative phrases co-occur more with “poor”
• But how to measure co-occurrence?
43
PMI - Pointwise Mutual Information
• How much more do events x and y co-occur than if they were independent?
• PMI between words: How much more do two words co-occur than if they were independent?
44
How to estimate Pointwise Mutual Information?• Query search engine
• Turney used Altavista because it had NEAR operator
• P(word) estimated by hits(word)
• P(word1, word2) estimated by hits(word1 NEAR word2)
45
Does word occur more with “poor” or “excellent”?
46
Phrases from a positive review
47
Phrases from a negative review
48
Results of Turney algorithm
• 410 reviews from Epinions• 170 (41%) negative
• 240 (59%) positive
• Majority class baseline: 59%
• Turney algorithm: 74%
• Phrases rather than words
• Learns domain-specific information
49
Using WordNet to learn polarityS.M. Kim and E. Hovy. 2004. Determining the sentiment of opinions.M. Hu and B. Liu. Mining and summarizing customer reviews.
• WordNet: online thesuarus
• Create positive (“good”) and negative seed-words (“terrible”)
• Find Synonyms and Antonyms• Positive Set: Add synonyms of positive words (“well”) and antonyms of
negative words
• Negative Set: Add synonyms of negative words (“awful”) and antonyms of positive words (”evil”)
• Repeat, following chains of synonyms
• Filter
50
Summary on semi-supervised lexicon learning
• Advantages:• Can be domain-specific• Can be more robust (more words)
Intuition
• Start with a seed set of words (‘good’, ‘poor’)
• Find other words that have similar polarity:• Using “and” and “but”• Using words that occur nearby in the same document• Using WordNet synonyms and antonyms
51
Using lexicon to detect document sentimentSimplest unsupervised method• Count the words with positive sentiment from a document
• Count the words with the negative sentiment
• Choose whichever value (positive or negative) has higher sum
52
Using lexicon to detect document sentimentSimplest supervised method• Build a classifier
• Predict sentiment (or emotion, or personality) given features
• Use “counts of lexicon categories” as a features
• Sample features:• LIWC category “cognition” had count of 7• NRC Emotion category “anticipation” had count of 2
• Baseline• Instead use counts of all the words and bigrams in the training set• This is hard to beat• But only works if the training and test sets are very similar
53
Conclusion on Sentiment Analysis
• Sentiment analysis is about detecting the polarity of subjective opinions
• Related tasks, all related to subjective aspects:• Emotion detection, emotion intensity detection
• Irony, sarcasm, humor detection
• Personality analysis
• In the simplest case SA is a document classification task, trying to predict the binary sentiment: positive or negative
• Task or domain-specific sentiment lexicons are useful in SA
54
LDA topic modeling
55
LDA topic modeling
• LDA – Latent Dirichlet Allocation
• Unsupervised document clustering into topics or themes
• Each document will be represented as a mixture of K topics
• The topic vectors can be used as document representations in further classification task
56
57David Blei, 2012. Probabilistic Topic Models
Intuition of LDA
• Each topic is a distribution over words in a fixed vocabulary• The genetics topic has words about genetics with high probability
• The evolutionary biology topic has words about evolutionary biology with high probability
• Each document has a distribution over topics• In a two topic situation (genetics and evolutionary biology) a document may
be represented for instance with a vector [0.7, 0.3]
• This is a mixture of topics
58
Generative story
LDA is a generative model, thus it has a generative story
• For each topic, generate a distribution over words
• For each document, generate a distribution over topics
• For each word position in the document:• Sample a topic from the document’s topic distribution
• Sample a word from that topic distribution
59
LDA model – plate diagram
60
Topic-word distributions
hyperparameters
wordsTopic index for each word
Document-topicdistributions
David Blei, 2012. Probabilistic Topic Models
Training an LDA model
• Learn the topic-word distributions
• For each document find the document-topic distributions
• Training equals inference because the model is unsupervised• By training the model on a text corpus we also obtain the topic vectors for
these documents
Two main methods (both are beyond the scope of our course)
• Gibbs sampling
• Variational Bayes
61
LDA fit to 17000 articles from Science
62David Blei, 2012. Probabilistic Topic Models
Tools for training topic models
• Gensim
• Mallet
• Stanford Topic Modeling Toolkit
• jLDADMM
• R topic modeling package
63
Evaluation
64
Evaluation protocols
• Train-dev-test: 80/10/10, 70/20/10, 70/15/15• Train on training set
• Tune hyper-parameters on development set
• Test the best model on test set
• What if we can’t afford to split the data?• Cross-validation
65
K-fold cross-validation
• Break up data into 5 folds
• For each fold:• Choose the fold as a temporary test
set
• Train on 9 folds, evaluate on test fold
• Report average performance of the 5 runs
66
TrainingTest
Test
Test
Test
Test
Training
Training Training
Training
Training
Iteration
1
2
3
4
5
Cross-validation
• Most commonly used 10-fold CV and 5-fold CV• Extreme case is LOOCV – Leave one out cross-validation
• Stratified K-fold CV – the label distribution is preserved in each fold.• If the dataset contains 10 positive items and 30 positive items and data is
divided into 5 fold then how many positive and how many negative items will be in each fold?
• The performance on different folds can vary a lot, especially when folds are small
• If possible, separate the test set and tune the hyperparameters using CV on the train set
67
Aggregating CV results
• Micro-averaging• Sum TP, FP and FN counts from all folds and then compute precision, recall
and F-score
• Macro-averaging• Compute precision, recall and F-score for each fold and then average.
68
Precision and recall in multiclass classification
• Contingency table
69Figure 6.5: https://web.stanford.edu/~jurafsky/slp3/6.pdf
Aggregating precision and recall
70Figure 6.6: https://web.stanford.edu/~jurafsky/slp3/6.pdf
Conclusion for Extras
• LDA topic modeling – unsupervised method for clustering documents based on themes or topics
• It can provide useful features for document classification
• Use cross-validation when you can’t afford to split the data into train/dev/test
• Split data into train/test and perform CV with train set to choose features and hyperparameters
• With multiple classes evaluate precision and recall for every class and then aggregate with micro- or macro-averaging
71