Sentiment Classification using Machine Learning...

Sentiment

Classification using

Machine Learning

Techniques

� Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan.

2002. Thumbs up? Sentiment Classification using

Machine Learning Techniques. EMNLP-2002, 79—86.

Cited by 7650

� Turney (2002): Thumbs Up or Thumbs Down? Semantic

Orientation Applied to Unsupervised Classification of Reviews Cited by 5313

Sentiment analysis

� Sentiment analysis is the detection of attitudes

� Sentiment analysis has many other names : Opinion extraction,Opinion mining, Sentiment mining, Subjectivity analysis.

� Type of attitude

�From a set of types

�Like, love, hate, value, desire, etc.

�Or (more commonly) simple weighted polarity:

�positive, negative, neutral, together with strength

DATASET

� Internet Movie Database (IMDb)

� Polarity detection:

� Is an IMDB movie review positive or negative?

� They imposed a limit of fewer than 20 reviews per author per sentiment category, yielding a corpus of 752 negative and 1301 positive reviews, with a total of 144 reviewers represented.

� Data: Polarity Data 2.0:

� http://www.cs.cornell.edu/people/pabo/movie-review-data

IMDB data

in the Pang and Lee database

when _star wars_ came out some twenty

years ago , the image of traveling throughout the stars has become a

commonplace image . […]

when han solo goes light speed , the stars change to bright lines , going towards the

viewer in lines that converge at an invisible point .

cool .

_october sky_ offers a much simpler image–that of a single white dot , traveling

horizontally across the night sky . [. . . ]

“ snake eyes ” is the most aggravating

kind of movie : the kind that shows so much potential then becomes

unbelievably disappointing .

it’s not just because this is a briandepalma film , and since he’s a great

director and one who’s films are always greeted with at least some fanfare .

and it’s not even because this was a film

starring nicolas cage and since he gives a brauvara performance , this film is

hardly worth his talents .

Baseline Algorithm

� Tokenization

� Feature Extraction

� Classification using different classifiers

� Naïve Bayes

�Maximum Entropy

� SVM（Support Vector Machine）

Adapted from Pang and Lee

Tokenization

� Given a character sequence and a defined document unit, tokenization is the task of chopping it up into pieces, called tokens , perhaps at the same time throwing away certain characters, such as punctuation.

� E.g.

� “I like computer science.”

� [’I’, ’like’ ‘computer’, ‘science’, ‘.’ ]

Tokenization - Stanford NLP Group

https://nlp.stanford.edu/IR-book/html/htmledition/tokenization-1.html

Extracting Features

� Sentiment can be expressed in a more subtle manner

“How could anyone sit through this movie?”

� How to handle negation

� I didn’t like this movie

vs

� I really like this movie

� Which words to use?

� Only adjectives

� All words

� All words turns out to work better, at least on this data

Negation

Add NOT_ to every word between negation and following punctuation:

didn’t like this movie , but I

didn’t NOT_like NOT_this NOT_movie but I

Classification

using different classifiers

�Naïve Bayes

�Maximum Entropy

�SVM

Bayes Theorem

•P(A|B) is “Probability of A given B”, the probability of A given that B happens•P(A) is Probability of A•P(B|A) is “Probability of B given A”, the probability of B given that A happens•P(B) is Probability of B

When P(Fire) means how often there is fire, and P(Smoke) means how often we see smoke, then:P(Fire|Smoke) means how often there is fire when we see smoke.P(Smoke|Fire) means how often we see smoke when there is fire.

Bayes Theorem

� Example:

� If dangerous fires are rare (1%) but smoke is fairly common (10%) due to factories, and 90% of dangerous fires make smoke then:

� P(Fire|Smoke) =P(Fire) P(Smoke|Fire) / P (Smoke)

=1% x 90%/10% = 9%

� In this case 9% of the time expect smoke to mean a dangerous fire.

Naïve Bayes

CNB : most likely class

prior

Naïve Bayes

• Calculate P(cj) terms

• For each cj in C do

docsj ← all docs with class =cj

P(wk | c j ) ←nk +α

n +α |Vocabulary |

• Textj ← single doc containing all docsj

• For each word wk in Vocabulary

nk ← # of occurrences of wk in Textj

• From training corpus, extract Vocabulary

• Calculate P(wk | cj) terms

Support Vector Machines

� A Support Vector Machine (SVM) is a discriminative classifier formally defined by a separating hyperplane. In other words, given labeled training data (supervised learning), the algorithm outputs an optimal hyperplane which categorizes new examples. In two dimentional space this hyperplane is a line dividing a plane in two parts where in each class lay in either side.

Support Vector Machines

Lets assume value of points on z plane, w = x² + y².

SUPPORT VECTOR MACHINES

Left: low regularization value, right: high regularization value

SUPPORT VECTOR MACHINES

Letting cj ∈ {1, −1} (corresponding to positive and negative) be the correct class

of document dj , the solution can be written as:

where the α j’s are obtained by solving a dual optimization problem. Those d⃗ such

that α is greater than zero are called support vectors, since they are the only

document vectors contributing to w⃗. Classification of test instances consists simply

of determining which side of w⃗’s hyperplane they fall on.

Uigrams and bigrams

• The n-grams typically are collected from a text or speech corpus.

• An n-gram of size 1 is referred to as a “unigram”; size 2 is a

“bigram” ; size 3 is a "trigram".

Result comparison

Unigrams appearing at least four times;Bigrams appearing at least seven times;

Conclusion

� Sentiment categorization vs topic-based classification

Sentiment categorization is more difficult than topic classification. In topic-based classification, all three classifiers have been reported to achieve accuracies of 90%.

� NB vs ME vs SVS

MaxEnt and SVM tend to do better than Naïve Bayes

� Feature frequency vs presence

Better performance is achieved by accounting only for feature presence, not feature frequency.

� Bigram information does not improve performance beyond that of unigram presence.

� Compare to only pick up adjective words, picking up all words turns out to work better.

PROBLEM

� Subtlety:

�How could anyone sit through this movie?”

� Thwarted Expectations and Ordering Effects

“This film should be brilliant. It sounds like a great plot,

the actors are first grade, and the supporting cast is good

as well, and Stallone is attempting to deliver a good

performance. However, it can’t hold up.”

Sentiment Classification using Machine Learning...

Documents

Transcript of Sentiment Classification using Machine Learning...