Misinformation on Twitter CS315 – Web Search and Data Mining.

48
Misinformation on Twitter CS315 – Web Search and Data Mining

Transcript of Misinformation on Twitter CS315 – Web Search and Data Mining.

Misinformation on Twitter

CS315 – Web Search and Data Mining

Twitter Primer

Twitter is a short message serviceIt allows you to:Tweet =

send a message to those who follow what you say

Re-tweet = send to your followers something you received

Reply = send a specific user a message, also seen by your

followers (@user)

Direct-message = send a specific user a message, no one else sees it

#hashtag = marks terms for easy search

Social Media in Search Results

October 24, 2009: bing introduces real-time resultsinside search resultsDec. 7, 2009: Google adopts real-time results in search results (usually third position)Twitter’s visibility grows dramatically

(reaching 6% of population, yet small compared to Google etc.)

Information Web and Social Web start coming together

Interaction of Networks

Late December 2009: The major search engines use real-time data in their search resultsThe Jan 19, 2010: MA Special Elections

Martha Coakley (D) vs Scott Brown (R)

As elections near, people google for candidate names

Searching for Political Candidates

When searching for Scott Brown vs Martha Coakley

The Tweeter Corpus

Over 200,000 Tweets from January 13-20, 2010

“Coakley” and “Scott Brown”

Surprisingly large number (41%) of retweets (RT):Why?Significant number (7%) of replies: Why?One in 3 tweets (32%)are repeating tweets:Why?

Huge number of retweets (RT). Why?

Hypothesis: You retweet a message if you agree with it.Indeed: Retweets reveal communities:Behavioral patterns provideanother way to determine political affiliation of users

Top200 group retweeting behavior

32% are repeating tweets: Why?

Your followers with who you greatly agreehave seen your tweet, so why repeat?You assume a leadership role in the campaign:

Top200 group memberswere 70 times more likelyto repeat a tweet.

You repeat to influence Google search:

Twitter-enabled Google bomb

Top200 group repeating behavior

Significant number of replies. Why?

Hypothesis:You reply to engage in a dialog or fight with othersNot alwaysBut top200 users were far less likely to reply, even though they spent a lot more time tweeting

Except…

Top200 group replying behavior

Significant number of replies. Why?

Hypothesis:You reply to engage in a dialog or fight with othersNot alwaysBut top200 users were far less likely to reply, even though they spent a lot more time tweeting

Except…

Top200 group replying behavior

The first Tweeter-bomb

Account creation and tweet bombs: signature of spamming9 accounts sent 929 reply-tweets to 573 users in 138 min.

Where were the URLs linking?

Who was behind the Tweeter-bomb?

Pre-Fabricated Tweet Factory targets News Media & Reporters

30 lists with tweets2758 tweets180 media accounts targeted.

DO YOUR JOB SHINE THE LIGHT ON ACORNhttp://bit.ly/DoYourJob @ACORN Nat @SEIU@GlobeSenateRace @wwlp #masen

WE THE PEOPLE WANT A FAIR ELECTIONhttp://bit.ly/acRNFraud @ACORN Nat @SEIU@GlobeSenateRace @wwlp #masen

Is THIS http://bit.ly/CoakleyTHUGS Why YOU”RE AFRAID to Investigate ACORN? @CBSEveningNews @katiecouric @ACORN_Nat #ACORN

Is there defense against false rumors?

You hear a rumor on Twitter. “A plane is spotted on the sea!” Should you retweet it? “Two policemen are shot in Ferguson.” How people react? “Terrorist warnings of attacks on London Tubes.” Is it true?

What can do for you?

You hear a rumor on Twitter. “A plane is spotted on the sea!” Should you retweet it? “Two policemen are shot in Ferguson.” How people react? “Terrorist warnings of attacks on London Tubes.” Is it true?

What questions can TwitterTrails.com answer? Rumor origin, spreading, crowd skepticism, polarization

How does TwitterTrails.com work? Collects data on demand: bit.ly/TTrequest Provides cool visualizations and ML to respond within minutes Based on “Retweeting indicates interest, agreement, trust” Harnesses the power of crowdsourcing

Stories Investigated (200+ so far)

ORIGINATOR, FIRST POSTER: Who made the rumor known? Who posted the rumor first?

TIMELINE OF SPREADING: When and how did the story break? Is the story still spreading?

PROPAGATORS: Who have been spreading the story through retweeting? Do they form a dense group or are there disconnected

networks?

RUMOR NEGATION: Are there any denials to the story?

MAIN ACTORS: Who are the most visible actors in the spreading,

according to the audience?

Questions that can answer

Story Summary (Overview)

Visualization of Initial Spreading

PROPAGATION GRAPH

Timeline of Spreading

TIME SERIES OF RELEVANT TWEETS

Retweet Network reveals PROPAGATORS

RETWEET NETWORK

Most Visible Actors and Polarization

Co-RETWEETED NETWORK

Features: Color indicates text similarity

Feature: Negation and keyword occurrence

blogs.wellesley.edu/twittertrailsblogs.wellesley.edu/twittertrails

Is it true? Is it false? Ask the crowd

SPREAD: Rate of all RTsSKEPTICISM: negating RTs / promoting RTs

Request an investigation: bit.ly/TTrequest

bit.ly/3SocialTheoremsbit.ly/3SocialTheorems

Why Twittertrails.com works?

SocTh 1: Retweeting a message indicates interest, trust, agreement

The sender matters less than the message

Some Reporters Want to Differ

Retrieved all 2,585 profiles containing “RT” and (“endorsement” or “agreement”)

53% belong to media people

13% belong to politicians

SocThm 2: Propagation vs Skepticism

Conjecture: On Twitter,

claims with higher skepticism and lower propagation scores are more likely to be false claims with lower skepticism and higher propagation scores are more likely to be true.

In Facebook the Conjecture may not be true

“Rumor Cascades” paper finds that rumors in Facebook never die…Why?

bit.ly/3SocialTheoremsbit.ly/3SocialTheorems

How do you know what you know?

Extrinsic reasons Trust in the entity supporting the information The majority of people use it extensively Technology can help here

Intrinsic reasons Own experience Own ability to think critically,

which means: Understanding the Scientific Method

and apply it habitually on important matters

But this is tough and requires Education

We also “know”…

What we learned as childrenWhat we remember incorrectly

There is no database in our brain, but we recreate memories every time we remember them

What we misunderstood Our brain is a pattern matching machine,

we find similarities even where there is none

What we think under the influence of substances, of voices in sleep, lack of sleepWhat we thinkunder fear, anger, passion, personal interest, using illogical processes____________ (Add your own examples)

Are we so stupid?

Our brain is impressively complicated, but it is not perfectIt is influenced by construction limitations and errors, our feelings, our senses, our environmentIt was created through an ongoing evolutionary process. Some parts are old and are activated immediately, others are newer and demand great energy to get activated.We need to feel that we are in control of our environment. We do not easily accept randomness in phenomena, we want to “discover” reasons explaining randomness.Critical thinking uses neocortex, large in size and requiring lots of energy to operate. We try to avoid using so much energy by creating heuristics, stereotypes, personal ways of “thinking”

Conclusion and Future Work

TwitterTrails.com: Use it to monitor your stories! Blog your findings: blogs.wellesley.edu/twittertrails Email [email protected]

Which metrics are more likely to signify a true rumor? a false rumor?

Better methods to detect negations of rumors?

“The Internet is full of lies.” Is it so? How “full”?

On Google Bombs

Online Political Spam: A Short History

The 2006 elections show potential for spamThe 2006 elections show potential for spam

Activists openly collaborating to Google-bomb search results of political opponents in 2006

Online Political Spam: A Short History

Search results for Senatorial candidate John N. Kennedy, 2008 USA Elections

Search results for Senatorial candidate John N. Kennedy, 2008 USA Elections

In 2008 Google takes things in its own handsIn 2008 Google takes things in its own hands

A more sophisticated effort

Will it work?

Did it work?

2008

2010