Towards Automatic Moderation of Online Hate Speech - Emily Spahn, March 2016

18
Online Hate Speech Towards automated moderation Emily Y. Spahn Galvanize Data Science Immersive, Seattle, Mar 2016

Transcript of Towards Automatic Moderation of Online Hate Speech - Emily Spahn, March 2016

Page 1: Towards Automatic Moderation of Online Hate Speech - Emily Spahn, March 2016

Online Hate SpeechTowards automated moderation

Emily Y. SpahnGalvanize Data Science Immersive, Seattle, Mar 2016

Page 2: Towards Automatic Moderation of Online Hate Speech - Emily Spahn, March 2016

What is Hate Speech?

Speech advocating incitement to harm based on the target's membership in a group

Definition

Page 3: Towards Automatic Moderation of Online Hate Speech - Emily Spahn, March 2016

The Problem with Hate Speech Online

Alienates Users

&

Costs Time

Page 4: Towards Automatic Moderation of Online Hate Speech - Emily Spahn, March 2016

AUTOMATE!

⇒ Build a model to predict if a comment is hate speech, and if so, against what group.

What can we do about online hate speech?

Page 5: Towards Automatic Moderation of Online Hate Speech - Emily Spahn, March 2016

DATA SOURCE: May 2015 Reddit comments

Page 6: Towards Automatic Moderation of Online Hate Speech - Emily Spahn, March 2016

Data Used: subreddits & data labeling

Reddit Comments from May 201554.5+ million comments with metadata, from 50,138 subreddits

Hateful Subreddits11 hateful subreddits

565,494 hateful comments:

● 56% body size● 33.6% gender● 9.4% race● 1% religion

Not Hateful Subreddits13 not hateful subreddits

1,012,052 not hateful comments:● 75% sometimes controversial

but well-moderated subreddits● 11.2 % gender● 7.7 % religion● 5.4 % body size● 0.4 % race

Page 7: Towards Automatic Moderation of Online Hate Speech - Emily Spahn, March 2016

Tools UsedComputing & Analysis Natural Language

Processing & Classification Modeling

NLTK

Page 8: Towards Automatic Moderation of Online Hate Speech - Emily Spahn, March 2016

Modeling

TF-IDF on 1.1 million comments

XGBoost multi-class classifier

Word2Vec for word embeddings

Page 9: Towards Automatic Moderation of Online Hate Speech - Emily Spahn, March 2016

TF-IDF: Term Frequency-Inverse Document Frequency

words in comments

Image from http://brandonrose.org/clustering

matrix of numbers

i : the wordj: the document

Bag of words + factor to weight rarely occurring words more than common ones

Page 10: Towards Automatic Moderation of Online Hate Speech - Emily Spahn, March 2016

Gradient Boosted Trees Classifier

From XGBoost Documentation

Decision trees:

Page 11: Towards Automatic Moderation of Online Hate Speech - Emily Spahn, March 2016

Gradient Boosted Trees Classifier

From XGBoost Documentation

Tree Ensembles

Page 12: Towards Automatic Moderation of Online Hate Speech - Emily Spahn, March 2016

Gradient Boosted Trees Classifier

Working on labeled data:Create one tree & run modelFind residuals (differences between model result & labeled data)

Create 2nd tree to fit to the residualsNew results = results from 1st tree + those from 2nd treeFind new residuals

Repeat, adding a tree to the model each time to fit the residuals, until you reach a cut off criteria.

Page 13: Towards Automatic Moderation of Online Hate Speech - Emily Spahn, March 2016

ROC Curve: Examine classification model success Most important features

fat

like

peopl

just

white

dont

fuck

im

becaus

game

jew

women

weight

say

Page 14: Towards Automatic Moderation of Online Hate Speech - Emily Spahn, March 2016

Potential Use Cases for the Predictive Model

More time for the mods!

User posts hateful comment

Model flags comment as hateful

Comment is in limbo until a human moderator reads it

Human evaluates comment and publishes or deletes

Power to the People!

Indicate via user icons or status information those who have a recent history of hateful comments.

Let site users decide if they want to read what this person has to say.

Page 15: Towards Automatic Moderation of Online Hate Speech - Emily Spahn, March 2016

Word2Vec: Most Similar Words

“fat”

skinny

ugly

lazy

lard

fatshit

fatass

slenderman

gtbanned

stupid

hamplanet

skinny

overweight

obese

underweight

and

muscular

that

body

is

anorexic

Page 16: Towards Automatic Moderation of Online Hate Speech - Emily Spahn, March 2016

Thank You!

Emily Y [email protected]

@eyspahn

https://github.com/eyspahn/OnlineHateSpeech

Clip art in the presentation from https://openclipart.org/

Page 17: Towards Automatic Moderation of Online Hate Speech - Emily Spahn, March 2016

Example Comment

Page 18: Towards Automatic Moderation of Online Hate Speech - Emily Spahn, March 2016

Data Used: subreddits Hateful Subreddits

Subreddit Name

Comment Count

Hate Type

CoonTown 51979 Race

WhiteRights 1352 Race

Transfags 2362 Gender

SlutJustice 209 Gender

TheRedPill 59145 Gender

KotakuInAction 128156 Gender

IslamUnveiled 110 Religion

GasTheKikes 919 Religion

AntiPOZi 4740 Religion

fatpeoplehate 311183 Size

TalesofFateHate 5239 Size

Not Hateful Subreddits

Subreddit Name

politics DebateReligion

worldnews religion

history islam

blackladies Judaism

lgbt BodyAcceptance

TransSpace fatlogic

TwoChromosomes women