Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes
Transcript of Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes
Disinformation on the Web: Impact, Characteristics and Detection of Wikipedia Hoaxes
Srijan Kumar Univ. of MarylandRobert West Stanford Univ.Jure Leskovec Stanford Univ.
1Originally presented at the 25th International World Wide Web Conference, Montreal, Canada, April 2016
Web: Source of information
2
62% adults in U.S.A. rely on social media
for news
28% of 18-24 year olds use
social media as primary
news source
Web: Source of false information
3
Types of false information
4
Misinformationhonest mistake
Disinformationdeliberate lie to
misleadHoax“deliberately fabricated falsehood made to masquerade as truth”Wikipedia
Why Wikipedia?
The free encyclopedia that anyone can edit
5
Easy to add (false) information
• Freely accessible
• Large reach• Major source of
information for many
Hoaxes on Wikipedia
6
Data: Wikipedia Hoaxes
Hoax article vs hoax facts
7
Data: Wikipedia Hoaxes
Hoax article vs hoax facts
21,218 hoax articles
8
Hoax lifecycle:
Wikipedia hoaxes
9
Impactof hoaxes
Characteristics
of hoaxesDetectionof hoaxes
Quantify their impact?
What are the hoaxes like?
Can we find them?
Impact of hoaxes“The worst hoaxes are those which (a) last for a long time, (b) receive significant traffic, (c) are relied upon by credible news media.”Jimmy Wales on Quora
10
Impact of hoaxes“The worst hoaxes are those which (a) last for a long time”
11
Time t between patrolling and flagging
0.99
0.90
Impact of hoaxes“The worst hoaxes are those which (b) receive significant traffic”
12
10 100
500
Number n of pageviews per day
Impact of hoaxes“The worst hoaxes are those which (c) are relied upon by credible news media”
13
1.08 active inlinks
per hoax article, on average
7% of hoax articles have
at least 5 active inlinks
Wikipedia hoaxes
14
Impactof hoaxes
Characteristics
of hoaxesDetectionof hoaxes
Most hoaxes are caught
soon, but some hoaxes are impactful
What are the hoaxes like?
Can we find them?
15
Successful hoaxpass patrolsurvive for a monthviewed 100+/day
Failed hoaxflagged and deleted during patrol
Wrongly flagged temporarily flagged
Legitimate articlesnever flagged
Hoax
Non-hoax
Characteristics of hoaxes
16
Appearance:how the article looks
Link-network:how the article connects
Support:how other articles refer to it
Editor:how the article creator looks
Characteristics of hoaxes
17
Surprisingly, hoax articles are longer than non-hoax articles!
Features:o Plain-text length
Appearance:how the article looks
Link-network:how the article connects
Support:how other articles refer to it
Editor:how the article creator looks
Characteristics of hoaxes
18
Surprisingly, hoax articles are longer than non-hoax articles!butthey mostly have plain text and have fewer web and wiki links.
Appearance:how the article looks
Link-network:how the article connects
Support:how other articles refer to it
Editor:how the article creator looks
Features:o Plain-text lengtho Plain-text-to-markup
ratioo Wiki-link densityo Web-link density
Characteristics of hoaxes
19
Clustering coefficient = 0incoherent article
Clustering coefficient > 0coherent article
Legitimate articles are more coherent than successful hoaxes
Appearance:hoaxes mostly have text and few references.
Link-network:how the article connects
Support:how other articles refer to it
Editor:how the article creator looks
Characteristics of hoaxes
20
Hoax mentions are less in number.
Features:o Number of prior
mentions
Appearance:hoaxes mostly have text and few references.
Link-network:hoaxes have incoherent wikilinks.
Support:how other articles refer to it
Editor:how the article creator looks
Characteristics of hoaxes
21
Hoax mentions are less in number, mostly created by article creator or anonymously, and are more recently created.
Features:o Number of prior
mentionso Creator of first mentiono Time since first mention
Appearance:hoaxes mostly have text and few references.
Link-network:hoaxes have incoherent wikilinks.
Support:how other articles refer to it
Editor:how the article creator looks
Characteristics of hoaxes
22
Hoax creators are more recently registered, and have lesser editing experience.
Features:o Creator’s time since
registrationo Creator’s experience
Appearance:hoaxes mostly have text and few references.
Link-network:hoaxes have incoherent wikilinks.
Support:hoaxes have few, recent, suspicious mentions.
Editor:how the article creator looks
Wikipedia Hoaxes
23
Impactof hoaxes
Characteristics
of hoaxesDetectionof hoaxes
Hoaxes are different from non-hoaxes in many respects
Most hoaxes are caught
soon, but some hoaxes are impactful
Can we find them?
Detection of hoaxes
24
Will a hoax get past patrol?
Is an article a hoax?
Is an article flagged as hoax really one?
AUC = 71% Appearance features
AUC = 98% Editor and Network features
AUC = 86% Editor and support features
We discovered previously unknown hoaxes!
25
Flagged by us and deleted by Wikipedia administrators
Steve Moertel
Americanpopcorn
entrepreneur
Article survived over
6 years 11 months!
Can readers identify hoaxes?
26
Results
320 random hoax and non-hoax pairs 10 raters on Amazon Mechanical Turk rated each pair
Casual readers are gullible to hoaxes.Accurate detection needs non-appearance features.
50%Random
66%Human
86%Classifier
What fools humans?
27
Humans get fooled when article looks more “genuine”, and it is assumed to be credible.
Comparing easy- vs hard-to-identify hoaxes
How to identify misinformation on the web?
28
● Appearance○ How well referenced is the information source?○ What is the content of the article?
● Editor○ Who created the information?
● Network○ How related is this information to other
information it references to?● Support
○ Is there any evidence of the information, prior to its creation?
Wikipedia Hoaxes
29
Impactof hoaxes
Characteristics
of hoaxesDetectionof hoaxes
Hoaxes are different from non-hoaxes in many respects
Most hoaxes are caught
soon, but some hoaxes are impactful
Non-appearance features are important to
detect hoaxes
Thank you!
[email protected] http://cs.umd.edu/~srijan