Sentiment analysis
description
Transcript of Sentiment analysis
Sentiment analysis
Or, how to find happiness.
Why do we want sentiment info?
• Useful input for detection– Brand sentiment
• Useful input for prediction– Stock market, box office revenues, political
outcomes– Potentially for social uprisings, terrorist incidents
What do you really want to know?
Brand satisfaction
Quality of life
Abstract predictor
Three considerations for a sentiment analysis system
• Data cleaning
• One piece of the puzzle
• Simple works best
Data cleaning (Because it’s a dirty world)
Data cleaning: on Twitter…
• Spam accounts
• Bots (Weather, sport, etc…)
Answer: a) http://trst.me/ (from infochimps)b) Make your own system
Data cleaning: from sentences to words
1. Tokenize the sentence(s) into words. (This may not be as easy as it seems).
2. Maybe do stopping/stemming, depending on application.
3. Pick a threshold of times we have to see a word in our training set, below which we ignore it.
4. Build a dictionary of words.
Answer: a) Twokenize.pyb) Write your own
One piece of the puzzle
Always make it part of a system
• When it’s wrong (and this is quite often) it will be very obviously wrong
• People don’t need to see this• This doesn’t actually detract from the utility of
the system
Success:
• Tracking political polls.• Predicting box office revenues.• Predicting the stock market.
Simple works best (for now)
The quick version
• Use supervised/semi-supervised learning method.
• For most cases I would recommend Naïve Bayes on the Bag of Words representation. Very simple to implement and near-best performance.
• If you don’t have any examples of happy/sad tweets (for your purpose), use known keywords, such as emoticons.
:)
^_^
:(
<3
:/
Things that don’t really help
• More advanced classifiers (eg SVMs)• Part of Speech tagging• Parse trees• Semi-supervised methods if you have very
large amounts of data
(Generally less than 2% improvement)
The formula for happiness
Basic positive/negative Twitter sentiment word list
• http://alexdavies.net/projects/twitter-sentiment-word-lists/
Thanks.