Predicting The Future With Social Media

25
Predicting the Future With Social Media Sitaram Asur Bernardo A. Huberman Social Computing Lab The Social Computing Lab focuses on methods for harvesting the collective intelligence of groups of people in order to realize greater value from the interaction between users and information. Published on arXiv Cornell University – March 2010 Maurizio Napolitano, SoNet group,http://sonet.fbk.eu - April 2010 http://arxiv.org/abs/1003.5699

description

These slides were used for an internal presentation of the SoNet group - http://sonet.fbk.eu Every week, one member of the SoNet group presents a research papers to the other members. The mentioned paper(s) are hence written by other researchers. This the abstract of the original paper made by Sitaram Asur, Bernardo A. Huberman In recent years, social media has become ubiquitous and important for social networking and content sharing. And yet, the content that is generated from these websites remains largely untapped. In this paper, we demonstrate how social media content can be used to predict real-world outcomes. In particular, we use the chatter from Twitter.com to forecast box-office revenues for movies. We show that a simple model built from the rate at which tweets are created about particular topics can outperform market-based predictors. We further demonstrate how sentiments extracted from Twitter can be further utilized to improve the forecasting power of social media.

Transcript of Predicting The Future With Social Media

Page 1: Predicting The Future With Social Media

Predicting the Future With Social Media

Sitaram Asur Bernardo A. Huberman

Social Computing LabThe Social Computing Lab focuses on methods for harvesting the collective intelligence of groups of people in order to realize greater value from the interaction between users and information.

Published on arXiv Cornell University – March 2010

Maurizio Napolitano, SoNet group,http://sonet.fbk.eu - April 2010

http://arxiv.org/abs/1003.5699

Page 2: Predicting The Future With Social Media

SoNet Research Meetings

These slides were used for an internal presentation of the SoNet group.

Every week, one member of the SoNet group presents a research papers to the other members. The mentioned paper(s) are hence written by other researchers.

Being internal presentations, these slides might be a bit rough and unpolished.

You can find more information (including this presentation) about the SoNet group at http://sonet.fbk.eu

Page 3: Predicting The Future With Social Media

The questionHow social media content can be used to predict

real-world outcomes?

The case study:predicting box-office revenues for movies using predicting box-office revenues for movies using the chatter from Twitterthe chatter from Twitter

Why Twitter?

Why movies?

several tens of millions of users who actively participate in the creation and propagation of content

The topic of movies is of considerable interest among the social media user community The real-world outcomes can be easily observed from box-office revenue for movies

Page 4: Predicting The Future With Social Media

Topics

Viral marketing

• How buzz and attention is created for different movies

• How buzz and attention changes over time

Sentiments •How are created•How positive and negative opinions propagate•How they influence people

movies that are well talked about will be movies that are well talked about will be well-watched?well-watched?

Page 5: Predicting The Future With Social Media

What discovery

• Social media feeds can be effective indicators of Social media feeds can be effective indicators of real-world performancereal-world performance

• The rate at which movie tweets movie tweets are generated can be can be used to build a powerful modelused to build a powerful model for predicting movie box-office revenue.

• The predictions are better thanThe predictions are better than those produced by the the Hollywood Stock ExchangeHollywood Stock Exchange, the gold standard in the industry

Page 6: Predicting The Future With Social Media

The datasetTWITTER search API•tweets•@userid•retweet

2.89 million tweets referring to 24 different movies period of 3 months (nov-feb)from 1.2 million users

by using the moviesmovies keywords

Armored (2009-12-04)

Daybreakers(2010-01-08)

Extraordinary Measures(2010-02-22)

Leap Year(2010-01-08)

Princess And The Fog(2009-11-13)

Tooth Fairy(2010-02-26)

Avatar (2009-12-18)

Dear John(2010-02-05)

From Paris With Love(2010-02-05)

Legion(2010-01-22)

Sherlock Holmes(2009-12-15)

Transylmania(2009-12-04)

The Blind Side(2009-11-15)

Did You Hear About The Morgans(2009-12-08)

The Imaginarium of Dr Parnassus(2010-01-08)

Twilight: New moon(2009-11-20)

Spy Next Door(2010-01-15)

When in Rome(2010-01-29)

The Book of Eli (2010-01-15)

Edge of Darkness(2010-01-29)

Invictus(2009-12-11)

Pirate Radio(2009-11-13)

The Crazies(2010-02-26)

Youth in Revolt(2010-01-08)

critical period = the time to the week before a release moviecritical period = the time to the week before a release movie

Page 7: Predicting The Future With Social Media

Dataset charatecteristicsNumber of tweets per unique authors for different movies

y → tweetsx → dayslines → movies

LIKE the box-office trends!!!LIKE the box-office trends!!!

Page 8: Predicting The Future With Social Media

Dataset characteristicsNumber of tweets per unique authors for different movies

y → tweets per authorsx → dayslines → moviesratio remains fairly consistent between 1 and 1.5ratio remains fairly consistent between 1 and 1.5

Page 9: Predicting The Future With Social Media

Dataset charatecteristicsLog distribution of authors and tweets over the critical period

y → log(frequency of authors)x → log(number of tweets)

POWER LAW – Zipfian distributionPOWER LAW – Zipfian distributionA few authors generating a large number of tweetsA few authors generating a large number of tweets

Page 10: Predicting The Future With Social Media

Dataset characteristics Distribution of total authors and the movies they comment on

y → authorsx → number of movies

POWER LAW POWER LAW A majority of the authors talking about only a few moviesA majority of the authors talking about only a few movies

Page 11: Predicting The Future With Social Media

Attention and popularity Twitter and real world

“Prior to the release of a movie, media companies and and producers generate promotional information in the form of trailer videos, news, blogs and photos. We expect the tweets for movies before the time of their release to consist primarily of such promotional campaigns, geared to promote word-ofmouth cascades”

In Twitter:

tweetstweets and retweetsretweets

referring a particular urlurl (photos, trailer and other promotional material)

Page 12: Predicting The Future With Social Media

Attention and popularityPercentages of urls in tweets for different movies

there is a greater percentage of tweets containing urls in the week prior to release than afterwards

Page 13: Predicting The Future With Social Media

Attention and popularitytweets with url VS retweets

Features Week 0 Week 1 Week 2

url 39.5 25.5 22.5

retweet 12.1 12.1 11.66

URLs and RETWEETs PERCENTAGES FOR CRITICAL WEEK

Features Correlation R2

url 0.64 0.39

retweet 0.5 0.20

CORRELATION and COEFFICENT OF DETERMINATION (R2 ) values for URLS and RETWEETs before release

“This result is quite surprising since we would expect promotional material to contribute significantly to a movie’s box-office income”

Page 14: Predicting The Future With Social Media

Predictionfirst weekend Box-office revenues

“Using the tweets referring to movies prior to their release, can we accurately predict the box-office revenue generated by the movie in its opening weekend?”

TWEETRATETWEETRATEnumber of tweets referring to a particular movie per hour

How use a quantifiable measure on the tweets?

“the correlation of the average tweetrate with the box-office gross for the 24 movies considered showed a strong positive correlation, with a correlation coefficient value of 0.90”

Tweetrate mov =∣tweets mov∣∣Time hours∣

Page 15: Predicting The Future With Social Media

Predictionuse the regression analisys!

Prediction compared with the real box-office revenue information extracted from the Box Office Mojo website => POSITIVE RESULTS

Regression analysis with:

•Time series values of the tweet rate for the 7 days before the release

•Thent number of the theaters the movies were →released

•HSX Index the index of the Hollywood Stock →Exchange

Page 16: Predicting The Future With Social Media

Predictionlinear regression the results

Features Adjusted R2 p-value***

Avg Tweet-rate 0.80 3.65e-09

Tweet-rate timeseries 0.93 5.279e-09

Tweet-rate timeseries + thent 0.973 9.14e-12

HSX timeseries + thent 0.963 1.030e-10

Page 17: Predicting The Future With Social Media

PredictionPredicted vs Actual box office scores using tweet-rate and HSX predictors

Page 18: Predicting The Future With Social Media

PredictionPredicting prices

Predictor Adjusted R2 p-value***

HSX timeseries + thent 0.95 4.495e-10

Tweet-rate timeseries + thent

0.97 2.379e-11

Prediction of HSX end of opening weekend price

Week-end Adjusted R2

Jan 15-17 0.92

Jan 22-24 0.97

Jan 29-31 0.92

Feb 05-07 0.95

Coefficient of determination (R2) values using tweet-rate timeseries for different week-ends

“The Hollywood Stock Exchange de-lists movie stocks after 4 weeks of release, which means that there is no timeseries available for movies after 4 weeks. In the case of tweets, people continue to discuss movies long after they are released”

Page 19: Predicting The Future With Social Media

Sentiment Analysisinvestigate the importance of sentiments in predicting future outcomes

•For each tweet assign the label Positive, Negative or Neutral• Clean data (no stop-words, removel url and userid,

replace title, question, exclamations)• Amazon Meccanical Turk (1000 workers)

•Use LingPipe – DynamicLDClassifier• Obtained an accuracy of 98%98%

1)Define two variables

Subjectivity=∣Positiveand NegativeTweets∣

∣NeutralTweets∣

PNratio=∣Tweetswith Positive Sentiment∣∣Tweetswith Negative Sentiment∣

Page 20: Predicting The Future With Social Media

Sentiment Analysis

X movies→Y subjectivity→the subjectivity increases after releasethe subjectivity increases after release

Page 21: Predicting The Future With Social Media

Sentiment Analysis

X movies→Y polarity→

The positive and negative go in the same direction The positive and negative go in the same direction of the movies successof the movies success

Page 22: Predicting The Future With Social Media

Sentiment Analysisregression analisys and polartiy (PNRatio)

Predictor Adjusted R2 p-value

Avg Tweet-rate 0.79 8.39e-09

Avg Tweet-rate + thent 0.83 7.93a-09

Avg Tweet-rate + PNRatio 0.92 4.31e-12

Tweet-rate time series 0.84 4.18e-06

Tweet-rate timeseries + thent

0.863 3.64e-06

Tweet-rate timeseries + PNRatio

0.94 1.84e-08

the sentiments do provide improvements, although they are not as important as the rate of tweets themselves

Page 23: Predicting The Future With Social Media

GENERAL PREDICTION MODEL FOR SOCIALMEDIA

A : rate of attention seeking P : polarity of sentiments and reviews D : distribution parameter

y denote the revenue to be predicted Є the errorβ values correspond to the regression coefficients

y=∧

y=a∗A p∗Pd∗D

Page 24: Predicting The Future With Social Media

Bibliography

D. M. Pennock, S. Lawrence, C. L. Giles, and F. A. Nielsen. The real power of artificial markets. Science, 291(5506):987–988, Jan 2001.

W. Zhang and S. Skiena. Improving movie gross prediction through news analysis. In Web Intelligence, pages 301304, 2009.

Page 25: Predicting The Future With Social Media

These slides are released under

Creative CommonsAttribution-ShareAlike 2.5●You are free:●to copy, distribute, display, and perform the work●to make derivative works●to make commercial use of the work

Under the following conditions:

Attribution. You must attribute the work in the manner specified by the author or licensor.

Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under a license identical to this one.●For any reuse or distribution, you must make clear to others the license terms of this work.●Any of these conditions can be waived if you get permission from the copyright holder.

Your fair use and other rights are in no way affected by the above.

More info at http://creativecommons.org/licenses/by-sa/2.5/