Sentiment Classification using Machine Learning...
Transcript of Sentiment Classification using Machine Learning...
Sentiment
Classification using
Machine Learning
Techniques
� Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan.
2002. Thumbs up? Sentiment Classification using
Machine Learning Techniques. EMNLP-2002, 79—86.
Cited by 7650
� Turney (2002): Thumbs Up or Thumbs Down? Semantic
Orientation Applied to Unsupervised Classification of Reviews Cited by 5313
Sentiment analysis
� Sentiment analysis is the detection of attitudes
� Sentiment analysis has many other names : Opinion extraction,Opinion mining, Sentiment mining, Subjectivity analysis.
� Type of attitude
�From a set of types
�Like, love, hate, value, desire, etc.
�Or (more commonly) simple weighted polarity:
�positive, negative, neutral, together with strength
DATASET
� Internet Movie Database (IMDb)
� Polarity detection:
� Is an IMDB movie review positive or negative?
� They imposed a limit of fewer than 20 reviews per author per sentiment category, yielding a corpus of 752 negative and 1301 positive reviews, with a total of 144 reviewers represented.
� Data: Polarity Data 2.0:
� http://www.cs.cornell.edu/people/pabo/movie-review-data
IMDB data
in the Pang and Lee database
when _star wars_ came out some twenty
years ago , the image of traveling throughout the stars has become a
commonplace image . […]
when han solo goes light speed , the stars change to bright lines , going towards the
viewer in lines that converge at an invisible point .
cool .
_october sky_ offers a much simpler image–that of a single white dot , traveling
horizontally across the night sky . [. . . ]
“ snake eyes ” is the most aggravating
kind of movie : the kind that shows so much potential then becomes
unbelievably disappointing .
it’s not just because this is a briandepalma film , and since he’s a great
director and one who’s films are always greeted with at least some fanfare .
and it’s not even because this was a film
starring nicolas cage and since he gives a brauvara performance , this film is
hardly worth his talents .
Baseline Algorithm
� Tokenization
� Feature Extraction
� Classification using different classifiers
� Naïve Bayes
�Maximum Entropy
� SVM(Support Vector Machine)
Adapted from Pang and Lee
Tokenization
� Given a character sequence and a defined document unit, tokenization is the task of chopping it up into pieces, called tokens , perhaps at the same time throwing away certain characters, such as punctuation.
� E.g.
� “I like computer science.”
� [’I’, ’like’ ‘computer’, ‘science’, ‘.’ ]
Tokenization - Stanford NLP Group
https://nlp.stanford.edu/IR-book/html/htmledition/tokenization-1.html
Extracting Features
� Sentiment can be expressed in a more subtle manner
“How could anyone sit through this movie?”
� How to handle negation
� I didn’t like this movie
vs
� I really like this movie
� Which words to use?
� Only adjectives
� All words
� All words turns out to work better, at least on this data
Negation
Add NOT_ to every word between negation and following punctuation:
didn’t like this movie , but I
didn’t NOT_like NOT_this NOT_movie but I
Classification
using different classifiers
�Naïve Bayes
�Maximum Entropy
�SVM
Bayes Theorem
•P(A|B) is “Probability of A given B”, the probability of A given that B happens•P(A) is Probability of A•P(B|A) is “Probability of B given A”, the probability of B given that A happens•P(B) is Probability of B
When P(Fire) means how often there is fire, and P(Smoke) means how often we see smoke, then:P(Fire|Smoke) means how often there is fire when we see smoke.P(Smoke|Fire) means how often we see smoke when there is fire.
Bayes Theorem
� Example:
� If dangerous fires are rare (1%) but smoke is fairly common (10%) due to factories, and 90% of dangerous fires make smoke then:
� P(Fire|Smoke) =P(Fire) P(Smoke|Fire) / P (Smoke)
=1% x 90%/10% = 9%
� In this case 9% of the time expect smoke to mean a dangerous fire.
Naïve Bayes
CNB : most likely class
prior
Naïve Bayes
• Calculate P(cj) terms
• For each cj in C do
docsj ← all docs with class =cj
P(wk | c j ) ←nk +α
n +α |Vocabulary |
• Textj ← single doc containing all docsj
• For each word wk in Vocabulary
nk ← # of occurrences of wk in Textj
• From training corpus, extract Vocabulary
• Calculate P(wk | cj) terms
Support Vector Machines
� A Support Vector Machine (SVM) is a discriminative classifier formally defined by a separating hyperplane. In other words, given labeled training data (supervised learning), the algorithm outputs an optimal hyperplane which categorizes new examples. In two dimentional space this hyperplane is a line dividing a plane in two parts where in each class lay in either side.
Support Vector Machines
Lets assume value of points on z plane, w = x² + y².
SUPPORT VECTOR MACHINES
Left: low regularization value, right: high regularization value
SUPPORT VECTOR MACHINES
Letting cj ∈ {1, −1} (corresponding to positive and negative) be the correct class
of document dj , the solution can be written as:
where the α j’s are obtained by solving a dual optimization problem. Those d⃗ such
that α is greater than zero are called support vectors, since they are the only
document vectors contributing to w⃗. Classification of test instances consists simply
of determining which side of w⃗’s hyperplane they fall on.
Uigrams and bigrams
• The n-grams typically are collected from a text or speech corpus.
• An n-gram of size 1 is referred to as a “unigram”; size 2 is a
“bigram” ; size 3 is a "trigram".
Result comparison
Unigrams appearing at least four times;Bigrams appearing at least seven times;
Conclusion
� Sentiment categorization vs topic-based classification
Sentiment categorization is more difficult than topic classification. In topic-based classification, all three classifiers have been reported to achieve accuracies of 90%.
� NB vs ME vs SVS
MaxEnt and SVM tend to do better than Naïve Bayes
� Feature frequency vs presence
Better performance is achieved by accounting only for feature presence, not feature frequency.
� Bigram information does not improve performance beyond that of unigram presence.
� Compare to only pick up adjective words, picking up all words turns out to work better.
PROBLEM
� Subtlety:
�How could anyone sit through this movie?”
� Thwarted Expectations and Ordering Effects
“This film should be brilliant. It sounds like a great plot,
the actors are first grade, and the supporting cast is good
as well, and Stallone is attempting to deliver a good
performance. However, it can’t hold up.”