Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24  · Deciphering...

31
Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professor of Computer Science Director MSc Advanced Computer Science School of Systems Engineering , University of Reading [email protected] http://www.personal.reading.ac.uk/~sis06gd/ Dr. James Reade (SPEIR) Lecturer in Economics School of Politics, Economics and International Relations, University of Reading [email protected] http://www.reading.ac.uk/economics/about/staff/j-j-reade.aspx Henley Business School, University of Reading, Friday 24 April 2015 Workshop on Big Social Data and Interdisciplinary Analytics

Transcript of Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24  · Deciphering...

Page 1: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24  · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof

Deciphering Social Media Messages for #GE2015

Dr. Giuseppe Di Fatta (SSE)

Associate Professor of Computer Science

Director MSc Advanced Computer Science

School of Systems Engineering , University of Reading

[email protected]

http://www.personal.reading.ac.uk/~sis06gd/

Dr. James Reade (SPEIR)

Lecturer in Economics

School of Politics, Economics and International Relations,

University of Reading

[email protected]

http://www.reading.ac.uk/economics/about/staff/j-j-reade.aspx

Henley Business School, University of Reading, Friday 24 April 2015

Workshop onBig Social Data and Interdisciplinary Analytics

Page 2: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24  · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof

by J. Reade and G. Di Fatta 2

Outline

• Introduction

– Motivation

– University of Reading initiative on Big Social Data

– Case study on the General Election 2015 (#GE2015)

• Nuts and bolts

– Twitter tracking and tweets gathering

– Tweets mining

– A knowledge discovery process

• Data analysis examples

– analysis of some key moments during the Leaders’ TV debate

Page 3: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24  · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof

by J. Reade and G. Di Fatta

Introduction

• Social media has exploded in recent years.

3

Page 4: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24  · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof

by J. Reade and G. Di Fatta

Introduction• Social media defined:

4

• Incredible numbers, incredible potential…

• We are the University of Reading Big Social Data Research Group.

• Formed Summer 2014 covering multiple disciplines across the university.

Page 5: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24  · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof

by J. Reade and G. Di Fatta

Form and Function

• Social media are of interest to social scientists:

• Social (and other) networks influence decision making.

• Favouritism, discrimination, bias, loyalty, etc. all influence

allocations of resources and outcomes.

• Social media are social networks quantified.

• Social networks publicise and propagate information:

• Information availability crucial in decision making.

• More information = better forecasting, better policy making?

• Social media present huge opportunities:

• But huge challenges: Collection, processing, understanding

the data.

• Cross-disciplinary collaboration essential.

5

Page 6: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24  · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof

by J. Reade and G. Di Fatta

An Open Multidisciplinary Group

Our group consists of:

• Computer scientists (Di Fatta, Stahl)

– Data Mining and Knowledge Discovery in Databases (KDD): collecting, processing and

extracting useful knowledge from data.

• Mathematicians (Vukadinović Greetham)

– Complex analysis of network dynamics.

• Applied Linguists (Jaworska)

– Extracting meaning from qualitative data.

• Economics (Reade, Nanda)

– Information is fundamental: Where does it appear, how is it propagated? Does it

influence prices/voting behaviour, or vice versa?

• Social scientists

– What can we learn about social (and other types of) interaction and outcomes?

6

Page 7: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24  · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof

by J. Reade and G. Di Fatta

Reading and the General Election

• On March the 1st we began collecting Tweets related to

politics and the general election

– General election related tweets: #GE2015, #Tories, #Labour, etc.

• In 53 days we’ve collected:

– 13M tweets 250K tweets/day 2.8 tweets/sec.

– with over 1.8M tweets during three TV debates alone.

• But what to do with this information?

7

April 2 April 16

Page 8: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24  · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof

by J. Reade and G. Di Fatta

Sentiment Analysis?

• Simple volume of tweets may be interesting, but is it useful?

• Increasing focus on sentiment, or mood: What do people

think?

– Does mood/sentiment yield predictive power?

– Academic papers have considered stock markets and sports events.

– During election time, sentiment hugely interesting…

• Who is ahead? Do big shifts occur?

• What messages stick? Persistence in sentiment?

8

Page 9: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24  · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof

by J. Reade and G. Di Fatta

Sentiment Analysis?

• Perhaps however, we have jumped a step:

– Sentiment is a latent concept: We never observe its true value.

– We can try to estimate it but we have no true value to compare against.

9

Page 10: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24  · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof

Nuts and Bolts

Page 11: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24  · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof

by J. Reade and G. Di Fatta

Twitter

• Twitter, described as "the SMS of the Internet“, is an online

social networking service that enables users to send and read

short 140-character messages called "tweets".

– launched in 2006

– photos and short videos can also be embedded

• In 2012, 100 million users, 340 million tweets per day

• In December 2014, more than 500 million users: more than

284 million are active.

• Record tweets: on February 3, 2013, Twitter announced that a

record 24.1 million tweets were sent the night of Super Bowl.

11

Page 12: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24  · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof

by J. Reade and G. Di Fatta

Twitter Per Second Records1. 143K TPS: TV broadcast of Anime movie "Castle in the Sky" in Japan on

Dec. 9, 2011

– At one point viewers joined forces, sending tweets at the same time to

symbolically help the movie's characters cast a spell.

2. 15K TPS: Euro 2012 Finals

– as Spain scored the winning goal against Italy in the 2012 European

Championship,

3. 10K TPS: Last Minutes of Super Bowl 2012

– as the Giants took the lead on a touchdown with 57 seconds left

16. 5.5K TPS: Japanese Earthquake and Tsunami on March 11, 2011

– Twitter turned into an emergency service for many following an 8.9 magnitude

earthquake and subsequent tsunami on Japan’s coast, while in Tokyo the

phone system went down.

12

Page 13: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24  · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof

by J. Reade and G. Di Fatta

Gathering Tweets

• Three methods to retrieve tweets

– Search API

• Representational State Transfer (REST) requests

• max 3200 tweets for each requests

• free

– Streaming API

• real-time streaming, OAuth for secure delegated access

• max 1% of the total volume of tweets

• free

– Firehose

• real-time streaming

• unlimited and guaranteed

• not free: only from Twitter commercial partners (e.g., DATASIFT)

13

Page 14: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24  · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof

by J. Reade and G. Di Fatta

Twitter Tracking and Tweets Collection

• Tracking terms on the Twitter Streaming API and gathering all

tweets which match them.

– more than 30 tracked terms, e.g.: ge2015, uklabour, conservative,

votetories, ukip, voteukip, LibDems, GreenParty, SNP, etc.

• But what if you track “Cameron”?

14

Cameron Dallas is an 18-year-old Vine celebrity.

Vine is a short video sharing service and microblogging website.

Page 15: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24  · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof

by J. Reade and G. Di Fatta

Twitter Tracking and Tweets Collection

• And what if you track “Labour”?

15

Page 16: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24  · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof

by J. Reade and G. Di Fatta

Twitter Tracking and Tweets Collection

• A new software has been developed.

– Objective: collect all tweets from March to May 2015 that are related to

UK politics.

– The software tracks terms on the Twitter Streaming API and gathers all

tweets which match them.

1. tracked terms (~30)

– e.g., ge2015, uklabour, votelabour, conservative, votetories, ukip, voteukip,

LibDems, GreenParty, SNP, etc

2. tracked terms that require a context check

– e.g. labour, greens, etc.

3. terms for context check (~50)

– e.g., government, politic, vote, election, parliament, economy, etc.

4. rejected terms

– e.g. USA, Canada, Clinton, TCOT, etc.

5. equivalent terms for aggregation of party references

– e.g. Tories, Tory, voteTories, Conservatives, etc.

16

Page 17: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24  · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof

by J. Reade and G. Di Fatta

A Multi-Threaded Process• There are three concurrent threads of execution which never stop:

1. the tweets consumer which

• manages the stream of tweets for the tracked terms,

• receives and process tweets from Twitter in real time and

• stores them to a secondary memory

2. the controller which

• controls that the tweets consumer is working properly

• and, if not, it starts a new consumer

3. the observer which

• generates and sends periodic summaries by email

• Further analytics is generated off-line by additional processing, such as

generation of

– counts, word clouds, co-occurrence of terms, sentiment index

17

Page 18: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24  · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof

by J. Reade and G. Di Fatta

Tweets Mining

• Term frequency

– Tweets as bag of words for computing

• Frequent tracked terms

• Frequent words

– Word clouds

• Twitter Sentiment Index

– A list of adjectives has been extracted from ‘political’ tweets

– Each adjective has been classified as positive, negative or neutral by

several team members.

– If a party or one of its equivalent terms is present in a tweet, positive

and negative adjectives contribute to a sentiment index for the party.

18

Page 19: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24  · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof

by J. Reade and G. Di Fatta

A Knowledge Discovery Process

• A process of knowledge discovery from social media

data streams (Twitter)

data gathering,

filtering and in-

line analytics

Twitter

Streaming

APIdata

storage

off-line data

analytics

[email protected]

To join the mailing list please contact <[email protected]>

Blog URL: http://blogs.reading.ac.uk/reading-general-election-blog/

1h and 24h automatic reports sent to:

From March 01:

13M tweets,

currently 350K

tweets per day

Page 20: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24  · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof

by J. Reade and G. Di Fatta

Number of Tweets per day• as reported by the observer by email at midnight

• Important TV events:

– debate-1 on 26/03/2015: ”Cameron & Miliband: The Battle for Number 10″

– debate-2 on 02/04/2015: ”Leaders’ debate″

– debate-3 on 16/04/2015: ”Challengers’ debate″

debate-3

debate-1

debate-2

Page 21: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24  · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof

by J. Reade and G. Di Fatta

Twitter Boom on April the 2nd

Leaders’ Debate (02-04-2015, 20:00-22:00)– If you have missed the TV debate, you can watch it on YouTube:

• https://www.youtube.com/watch?v=7Sv2AOQBd_s

• ‘political’ tweets on the entire day (24h)

– recorded: 800,350

– #leadersdebate: 438,944

– missed: 175,959 (18%) (because of Twitter track-limit)

– ext. total: 976,309

• TV debate related/induced tweets from 19:00 to 24:00

– recorded: 614,800

– ext. total: 790,759

Page 22: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24  · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof

by J. Reade and G. Di Fatta

Leaders’ Debate (02-04-2015)

• The number of tweets with a reference to a party

# tweets

(5’ intervals)

debate

Page 23: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24  · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof

by J. Reade and G. Di Fatta

Leaders’ Debate (02-04-2015)

• Twitter Sentiment Index: before, during and after the debate

Twitter

Sentiment

Index

(5’ intervals)

debate

Page 24: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24  · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof

by J. Reade and G. Di Fatta

Leaders’ Debate (02-04-2015)

• Two ‘interesting’ moments during the debate

– Two ‘interesting’ time intervals following those moments

#1 @ 20:55 #2 @ 21:35

10’ 20’Twitter

Sentiment

Index

(1’ intervals)

Page 25: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24  · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof

by J. Reade and G. Di Fatta

Leaders’ Debate (02-04-2015, 20:54)

#1 @20:54: Nigel Farage’s controversial statement (18” video)

Page 26: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24  · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof

by J. Reade and G. Di Fatta

Leaders’ Debate (02-04-2015, 20:55)

#1 @20:55: Nicola Sturgeon’s reply to Nigel Farage (8” video)

Page 27: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24  · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof

by J. Reade and G. Di Fatta

Leaders’ Debate (02-04-2015, #1)

• #1: word cloud for tweets referring to “SNP” from 21:02 to 21:12

Page 28: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24  · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof

by J. Reade and G. Di Fatta

Leaders’ Debate (02-04-2015, 21:35)

#2 @21:35: Nicola Sturgeon’s statement (12” video)

Page 29: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24  · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof

by J. Reade and G. Di Fatta

Leaders’ Debate (02-04-2015, #2)

• #2: word cloud for tweets referring to “SNP” from 21:40 to 22:00

Page 30: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24  · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof

by J. Reade and G. Di Fatta 30

Conclusions

We have been collecting Big Social Data

tweets about UK politics and GE2015 from March 2015

Simple real-time analysis and more complex off-line

analytics can provide interesting insights.

We will use the data in the future to test research ideas on

Text mining

Data visualisation

Complex networks (social networks)

Economics and Politics

Acknowledgments:

Prof. Steven Mithen (Deputy VC) for supporting this project

as well as HBS, SPEIR, SSE, SLL

Page 31: Deciphering Social Media Messages for #GE2015sis06gd/res/BSD-WS-24-04...2015/04/24  · Deciphering Social Media Messages for #GE2015 Dr. Giuseppe Di Fatta (SSE) Associate Professorof

Questions?