Topic and text analysis for sentiment, emotion, and computational social science

49
Topic and Text Analysis for Sentiment, Emotion, and Computational Social Science November 2012 Alice Oh [email protected] Users & Information Lab http://uilab.kaist.ac.kr 1 Thursday, December 6, 2012

Transcript of Topic and text analysis for sentiment, emotion, and computational social science

Page 1: Topic and text analysis for sentiment, emotion, and computational social science

Topic and Text Analysis for Sentiment, Emotion, and Computational Social Science

November 2012Alice [email protected] & Information Labhttp://uilab.kaist.ac.kr

1

Thursday, December 6, 2012

Page 2: Topic and text analysis for sentiment, emotion, and computational social science

Overview

• Topic modeling research

• CIKM 2011: Distance-dependent Chinese restaurant franchise (ddCRF)

• ICML 2012: Dirichlet process with random mixed measures (DP-MRM)

• CIKM 2012: Recursive chinese restaurant process for modeling topic hierarchies (rCRP)

• NIPS Big Learning Workshop 2012: Distributed Online Learning for Latent Dirichlet Allocation (DoLDA)

• Computational social science research

• WSDM 2011: Aspect sentiment unification model for online review analysis

• ICWSM 2012: Social aspects of emotions in Twitter conversations

• ACL 2012: Self-disclosure and relationship strength in Twitter conversations

2

Thursday, December 6, 2012

Page 3: Topic and text analysis for sentiment, emotion, and computational social science

Do you feel what I feel?Social Aspects of Emotions in Twitter Conversations

Suin Kim, JinYeong Bak, Alice OhICWSM 2012

3

Thursday, December 6, 2012

Page 4: Topic and text analysis for sentiment, emotion, and computational social science

Asking Research Questions

4

Thursday, December 6, 2012

Page 5: Topic and text analysis for sentiment, emotion, and computational social science

Asking Research Questions

4

Thursday, December 6, 2012

Page 6: Topic and text analysis for sentiment, emotion, and computational social science

Asking Research Questions

Human emotion is typically studied as a within-person, one-direction, non-repetitive phenomenon; focus has traditionally been on how one individual feels in reaction to various stimuli at a certain point of time. But people recognize and inevitably react emotionally and otherwise to expressions of emotion of other people. We propose that organizational dyads and groups inhabit emotion cycles: Emotions of an individual influence the emotions, thoughts and behaviors of others; others’ reactions can then influence their future interactions with the individual expressing the original emotion, as well as that individual’s future emotions and behaviors. People can mimic the emotions of others, thereby extending the social presence of a specific emotion, but can also respond to others’ emotions, extending the range of emotions present.

5

Thursday, December 6, 2012

Page 7: Topic and text analysis for sentiment, emotion, and computational social science

Social Aspects of Emotions: Motivating Question

How are our emotions affected by others we talk to?

Thursday, December 6, 2012

Page 8: Topic and text analysis for sentiment, emotion, and computational social science

Social Aspects of Emotions: Research Questions

• How do we communicate our emotions?

• Use a topic model on Twitter conversations to discover the “topics” that represent the eight emotions

• Analyze the proportions of the total tweets for the emotions

• How do we influence other people’s emotions?

• Analyze the and emotion transitions of the tweets

• Look for topics that change the emotions of the conversation partners

• Find interesting patterns of emotion pairs

Thursday, December 6, 2012

Page 9: Topic and text analysis for sentiment, emotion, and computational social science

Social Aspects of Emotions: Data

• Twitter conversation data: approx 220k dyads who “reply” to each other, 1,670k conversational chains

!"!

#!

$!

%!

Thursday, December 6, 2012

Page 10: Topic and text analysis for sentiment, emotion, and computational social science

Seed Words (We Feel Fine by Harris & Kamvar)

anticipationhopewaitawaitinspirexcitborereadiexpectnervoucalmmotivpreparcertainanxiouoptimistforese

joyawesomamazwonderexcitgladfinebeautihighluckisuperperfectcompletspecialblesssafeproud

angershitbitchassmeandamnmadjealoupissannoiangriupsetmoronragescrewstuckirrit

surpriseamazwowwonderweirdluckidiffer

awkwardconfusholistrangshockodd

embarrassoverwhelmastoundastonish

fearscarestresshorrornervouterroralarmbehindpanicfearafraiddesperthreatentensterrififrightanxiou

sadnesssorribadawsadwronghurtbluedeadlostcrushweakdepressworslowterribllone

disgustsickwrongevilfatuglihorriblgrossterriblselfishmiserpathetdisgustworthlessaw

ashamfuck

acceptanceokaioksamealrightsafelazirelaxpeaccontentnormalsecurcompletnumbfulfil

comfortdefeat

Thursday, December 6, 2012

Page 11: Topic and text analysis for sentiment, emotion, and computational social science

Dirichlet Forest Prior

• Dirichlet Forest Prior (Andrzejewski et al.)

• Mixture of Dirichlet tree distribution

• Dirichlet tree: Generalization of Dirichlet distribution

• Knowledge is expressed using Must-link and Cannot-link primitives

• Must-link (love, sweetheart)

• Cannot-link (exciting, bored)

10DF-LDA

Thursday, December 6, 2012

Page 12: Topic and text analysis for sentiment, emotion, and computational social science

Dirichlet Forest Prior

• Dirichlet Forest Prior (Andrzejewski et al.)

• Mixture of Dirichlet tree distribution

• Dirichlet tree: Generalization of Dirichlet distribution

• Knowledge is expressed using Must-link and Cannot-link primitives

• Must-link (love, sweetheart)

• Cannot-link (exciting, bored)

10

η

DF-LDA

Thursday, December 6, 2012

Page 13: Topic and text analysis for sentiment, emotion, and computational social science

Domain Knowledge in Dirichlet Forest Prior

11

Seed Words

anticipationhopewaitawaitinspirexcitborereadiexpectnervoucalmmotivpreparcertainanxiouoptimistforese

joyawesomamazwonderexcitgladfinebeautihighluckisuperperfectcompletspecialblesssafeproud

angershitbitchassmeandamnmadjealoupissannoiangriupsetmoronragescrewstuckirrit

surpriseamazwowwonderweirdluckidiffer

awkwardconfusholistrangshockodd

embarrassoverwhelmastoundastonish

fearscarestresshorrornervouterroralarmbehindpanicfearafraiddesperthreatentensterrififrightanxiou

sadnesssorribadawsadwronghurtbluedeadlostcrushweakdepressworslowterribllone

disgustsickwrongevilfatuglihorriblgrossterriblselfishmiserpathetdisgustworthlessaw

ashamfuck

acceptanceokaioksamealrightsafelazirelaxpeaccontentnormalsecurcompletnumbfulfil

comfortdefeat

Must-link within a class Cannot-link between classes

Thursday, December 6, 2012

Page 14: Topic and text analysis for sentiment, emotion, and computational social science

Dirichlet Forest vs. Dirichlet

12

FearDF-LDA don’t think but know why even wanna care worry understand

FearLDA good exam lol luck just school haha i’m xx worry tomorrow

SurpriseDF-LDA that very really cool wow wonder just some differ amazing

SurpriseLDA just rt holy got thank did shit new love lol awesome buy oh

SadnessDF-LDA bad my real feel life aw sad kill lost dead hurt wrong sick

SadnessLDA lol just know sorry isn’t oh tweet did haha don’t thought think

Thursday, December 6, 2012

Page 15: Topic and text analysis for sentiment, emotion, and computational social science

Emotion Topics How do we express emotions?

JoyAnticipation AngerTopic 114omglovehahathankreallyTopic 107lovethankfollowwow

Topic 159gooddayhopemorningthankTopic 158lovethankmisshug

Topic 125hopebetterfeelthanksoonTopic 26goodthankhopemiss

Topic 146comewaitweekdayjuneTopic 146gooddaytimework

Topic 131lmaofuckassbitchshitTopic 4assyolmaonigga

Topic 19lmaoshitdamnfuckohTopic 13shitniggasmhyea

FearTopic 48omgohlmaoshitscareTopic 78happenheartattackhospital

Topic 27don’tcomenightsleepoutsideTopic 140timegotworkday

SurpriseTopic 172yeagknowthinktruefunnyTopic 89knowdon’tthinklook

Topic 15thinkdon’tknowmakereallyTopic 94hahadontthinkreally

29 70 21 14 5

Sadness DisgustTopic 6ohsorryhahaknowdidntTopic 59hurtgotgoodbadpain

Topic 106tweetreplydidn’treadsorryTopic 155ohreallymakefeel

Topic 116ohfuckdon’tyeewTopic 116lookhahaohknow

Topic 22don’tohthinkyeahlmaoTopic 174don’tthinksaypeople

AcceptanceTopic 43okohthankcoolokayTopic 102knowtryletok

Topic 199xxthankgoodokayfollowTopic 8nightlovegoodsleep

17 7 18 NeutralTopic 180comwwwhttpcheckyoutubeTopic 156twitterfacebookpeopleaccount

Topic 184accountgoogleappworkemailTopic 67foodchickencookrt

19

13

Thursday, December 6, 2012

Page 16: Topic and text analysis for sentiment, emotion, and computational social science

Emotion Topics How do we express emotions?

JoyAnticipationTopic 114omglovehahathankreallyTopic 107lovethankfollowwow

Topic 125hopebetterfeelthanksoonTopic 26goodthankhopemiss

SadnessTopic 6ohsorryknowdidntTopic 59hurtgotgoodbadpain

NeutralTopic 180comwwwhttpcheckyoutubeTopic 156twitterfacebookpeopleaccount

GreetingCaring Sympathy IT/Tech

14

Thursday, December 6, 2012

Page 17: Topic and text analysis for sentiment, emotion, and computational social science

Emotion Transitions Plutchik’s Wheel of Emotions

Joy39.7%

0.51

Acceptance10.4%

0.23

Fear2.6%

0.11

Surprise7.4%

0.17

Anticipation15.1%

0.26

Disgust2.9%

0.11

Sadness9.1%

0.19

0.31Anger12.8%

0.37

0.33

0.32

0.31

0.33

0.21

0.34

0.15

0.140.13

0.15

15

Thursday, December 6, 2012

Page 18: Topic and text analysis for sentiment, emotion, and computational social science

Defining “Influence”

User A

User B

Having a tough day today. RIP Harrison. I’ll

miss you a ton :/

Just pray about it. God will help you.

Not really religious, but thanks man. :)

If you need talk you know I’m here.

Time

(Sadness) (Acceptance)

(Anticipation)

16

Thursday, December 6, 2012

Page 19: Topic and text analysis for sentiment, emotion, and computational social science

Defining “Influence”

emotion influencing tweet

User A

User B

Having a tough day today. RIP Harrison. I’ll

miss you a ton :/

Just pray about it. God will help you.

Not really religious, but thanks man. :)

If you need talk you know I’m here.

Time

(Sadness) (Acceptance)

(Anticipation)

16

Thursday, December 6, 2012

Page 20: Topic and text analysis for sentiment, emotion, and computational social science

Topic 117tweetpeopledon’treadpostTopic 59hurtgotbadpainfeel

Emotion Influences What can you say to make your partner feel better?

Joy → SadnessSadness → Joy

Topic 18wearlookthinkloveblackTopic 24lovethankgreatnewlook

Acceptance → Anger

Topic 31i’mgotlmaxshitdaTopic 13lmaoshitniggasmhyea

GreetingSympathizing

Swearing Complaining

17

Thursday, December 6, 2012

Page 21: Topic and text analysis for sentiment, emotion, and computational social science

0

0.075

0.15

0.225

0.3

Anticipation Joy Surprise Fear Anger Sadness Disgust Acceptance Neutral

0.0410.0710.082

0.053

0.265

0.0610.081

0.0420.051

Emotion Influence: Sadness to Joy

Emotion Influence: Joy to Anger

Emotion Influence:Anger to Joy

0

0.1

0.2

0.3

0.4

Anticipation Joy Surprise Fear Anger Sadness Disgust Acceptance Neutral

0.2110.230.2140.2090.1910.2370.253

0.358

0.273

Expressing Anger has 26.5% of chance of changing the partner’s emotion from

Joy to Anger.

18

Expressing Joy has 35.8% of chance of changing the partner’s emotion from Sadness to Joy.

Thursday, December 6, 2012

Page 22: Topic and text analysis for sentiment, emotion, and computational social science

Outliers

19

A: Sorry to hear about your bags. If you would like us to get someone to contact you DM usyour reference and contact number.

B: it's on it's way to manch. If the woman on the check in desk in Miami hadn't been tryingto be all smart! Been no problem.

A: Sorry about that. Pleased to hear they located it quickly for you though.

B: mistakes happen.

Thursday, December 6, 2012

Page 23: Topic and text analysis for sentiment, emotion, and computational social science

Analyzing Self-Disclosure Behaviors in Twitter Conversations Using Text Mining

Techniques (Presented at ACL 2012)

JinYeong Bak, Suin Kim, Alice Oh{jy.bak, suin.kim}@kaist.ac.kr, [email protected]

Department of Computer Science, KAIST

Thursday, December 6, 2012

Page 24: Topic and text analysis for sentiment, emotion, and computational social science

2012-07-11

In social psychology} Degree of self-disclosure in a relationship depends on

the strength of the relationship} Strategic self-disclosure can strengthen the relationship

Introduction

21

I like you too!

You’re my best

friend!

Thursday, December 6, 2012

Page 25: Topic and text analysis for sentiment, emotion, and computational social science

2012-07-11

Hypothesis

22

Twitter conversations also show a similar pattern} Dyads with high relationship strength show more self-disclosure

behavior} Dyads with low relationship strength show less self-disclosure

behavior

I like you too!

You’re my best

friend!Hello~

Hi

Thursday, December 6, 2012

Page 26: Topic and text analysis for sentiment, emotion, and computational social science

2012-07-11

Methodology} Twitter Data} 131K users } 2M conversations

} Relationship Strength} Chain frequency (CF)} Chain length (CL)

} Self-Disclosure} Personal information} Open communication} Profanity

} Analysis with Topic Models} Latent Dirichlet allocation (LDA, [Blei, JMLR 2003])} Aspect and sentiment unification model (ASUM, [Jo, WSDM 2011])

23

Thursday, December 6, 2012

Page 27: Topic and text analysis for sentiment, emotion, and computational social science

2012-07-11

Twitter Conversation} A Twitter conversation chain} 3 or more tweets } at least one reply by each user

} Our Twitter conversation data} Oct 2011 to Dec 2011} 131K users} 2M chains} 11M tweets

24

https://twitter.com/#!/britneyspears

Example of a conversation chain

Thursday, December 6, 2012

Page 28: Topic and text analysis for sentiment, emotion, and computational social science

2012-07-11

Relationship Strength} Social psychology literature states relationship strength can be

measured by communication frequency and length [Granovetter, 1973;

Levin and Cross, 2004]} CF: chain frequency} The number of conversational chains between the dyad

averaged per month} CL: chain length} The length of conversational chains between the dyad

averaged per month} Relationship strength} A high CF or CL for a dyad means the relationship is strong} A low CF or CL for a dyad means the relationship is weak

25

Thursday, December 6, 2012

Page 29: Topic and text analysis for sentiment, emotion, and computational social science

2012-07-11

Self-Disclosure} Open communication - Openness} Negative openness} Nonverbal openness} Emotional openness} Receptive openness – difficult to find in tweets} General-style openness – not clearly defined in the literature

} Personal Information} Personally Identifiable Information (PII)} Personally Embarrassing Information (PEI)

} Profanity} nigga, ass, wtf, lmao

26

Thursday, December 6, 2012

Page 30: Topic and text analysis for sentiment, emotion, and computational social science

2012-07-11

Negative openness

} Method} We use ASUM with emoticons as seed words

[ “Aspect and sentiment unification model for online review analysis”, Jo, WSDM’11]} ASUM is LDA-based joint model of topic and sentiment} ASUM takes unannotated data and classifies each sentence (tweet) as

positive/negative/neutral

Self-Disclosure - Openness

27

Thursday, December 6, 2012

Page 31: Topic and text analysis for sentiment, emotion, and computational social science

2012-07-11

Self-Disclosure - OpennessNonverbal openness

} Method} We look for emoticons, ‘lol’, ‘xxx’} Emoticons are like facial expressions -- :) :( :P} ‘lol’ (laughing out loud) and ‘xxx’ (kisses) are very frequently used in a

similar manner to nonverbal openness

28

Thursday, December 6, 2012

Page 32: Topic and text analysis for sentiment, emotion, and computational social science

2012-07-11

Self-Disclosure - OpennessEmotional openness

} Method} Look for tweets that contain common expressions of feeling words

[We feel fine (Harris, J, 2009)]

29

Thursday, December 6, 2012

Page 33: Topic and text analysis for sentiment, emotion, and computational social science

2012-07-11

Self-Disclosure – Personal InformationPersonally Identifiable Information (PII)

Personally Embarrassing Information (PEI)

30

Ex) name, location, email address, job,social security number

Ex) clinical history,sexual life,job loss, family problem

Thursday, December 6, 2012

Page 34: Topic and text analysis for sentiment, emotion, and computational social science

2012-07-11

Self-Disclosure – Personal Information}  

31

Thursday, December 6, 2012

Page 35: Topic and text analysis for sentiment, emotion, and computational social science

2012-07-11

Self-Disclosure – Personal InformationExample of PII, PEI and Profanity topics } Shown by high probability words in each topic

PII 1 PII 2 PEI 1 PEI 2 PEI 3 Profanity

san tonight pants teeth family nigga

live time wear doctor brother lmao

state tomorrow boobs dr sister shit

texas good naked dentist uncle ass

south ill wearing tooth cousin bitch

32

Thursday, December 6, 2012

Page 36: Topic and text analysis for sentiment, emotion, and computational social science

2012-07-11

Results

Thursday, December 6, 2012

Page 37: Topic and text analysis for sentiment, emotion, and computational social science

2012-07-1134

weak ßà strong weak ßà strong

weak ßà strong weak ßà strong

sentiment nonverbal emotional profanity PII & PEI

Thursday, December 6, 2012

Page 38: Topic and text analysis for sentiment, emotion, and computational social science

2012-07-1135

weak ßà strong

weak ßà strong

emotional PII & PEI

weak ßà strong

weak ßà strong

Thursday, December 6, 2012

Page 39: Topic and text analysis for sentiment, emotion, and computational social science

2012-07-11

Results: Interpretation} Emotional openness} When they are not very close, they express frequent encouragements,

or polite reactions to baby or pets

36

Thursday, December 6, 2012

Page 40: Topic and text analysis for sentiment, emotion, and computational social science

2012-07-11

Results: Interpretation} PII} When they meet new acquaintances, they use PII to introduce

themselves

37

Thursday, December 6, 2012

Page 41: Topic and text analysis for sentiment, emotion, and computational social science

2012-07-11

ResultsAnalyzing outliers: a dyad linked weakly but shows high self-disclosure

38

Thursday, December 6, 2012

Page 42: Topic and text analysis for sentiment, emotion, and computational social science

Distributed Online Learning for Latent Dirichlet Allocation

JinYeong Bak, Dongwoo Kim, and Alice OhNIPS 2012Workshop on Big Learning

39

Thursday, December 6, 2012

Page 43: Topic and text analysis for sentiment, emotion, and computational social science

Motivation

• Problem 1: Inference for LDA takes a long time

• Problem 2: Continuously expanding corpus necessitates continuous updates of model parameters

• But updating of model parameters is not possible with plain LDA

• Must re-train with the entire updated corpus

• Solution to 1: Distributed inference shortens inference time (Newman JMLR 2009, Wang WWW 2012)

• Solution to 2: Online (batch) learning enables updates to model parameters (Hoffman NIPS 2010)

• Our Approach: Combine distributed inference and online learning

40

Thursday, December 6, 2012

Page 44: Topic and text analysis for sentiment, emotion, and computational social science

Distributed Online LDA

• Based on variational inference

• Mini-batch updates via stochastic learning (variational EM)

• Distribute variational EM using MapReduce

41

Thursday, December 6, 2012

Page 45: Topic and text analysis for sentiment, emotion, and computational social science

Experimental Setup

• Data: 5.1M Twitter conversations

• 4.8M English Wikipedia articles

• 60 node Hadoop system

• Each node with 8 x 2.30GHz cores

42

Thursday, December 6, 2012

Page 46: Topic and text analysis for sentiment, emotion, and computational social science

Wikipedia Results

43

Topic 0 Topic 22 Topic 42 Topic 65 Topic 94 Topic 170 Topic 232

relativityphysicseinsteinquantumgravity

channeltelevision

tvcablenews

milkchocolate

sugarfood

cream

godbible

moseschaptergenesis

partyelection

presidentmemberelected

seasonteamleaguegame

football

albumsongbandmusic

released

Minibatch oLDA DoLDA Speedup

16,384 238666.25 47994.03 4.97

32,768 188508.71 33470.03 5.63

65,536 206290.27 26788.53 7.70

Thursday, December 6, 2012

Page 47: Topic and text analysis for sentiment, emotion, and computational social science

Twitter Temporal Patterns of Topics

44

Conversation b1 on November 2, 2010 A I wish I could vote today, but I have to work for 14 hours B is it legal for them not to give you time off to vote? A probably Conversation b2 on March 31, 2012 A Mitt Romney: "Obama should release the notes and transcripts of

all his meetings with world leaders" B Why is he being held to higher standard than any other president. A did you see my Santorum 'slip' tweet? Is the media afraid to

comment on it? B oh yes I did. I saw it mentioned yesterday also. disgusting and he

should be raked over hot coals for it.

0.005

0.010

0.015

10−10 11−01 11−04 11−07 11−10 12−01 12−04Day

Doc

umen

t pro

porti

on

0.004

0.006

0.008

0.010

0.012

11−07 11−10 12−01Day

Doc

umen

t pro

porti

on

Conversation c1 on September 5, 2011 A Oh god, miss Waite ran over to me up the school just now! :L on

the plus subjects are now picked! :D B what did you pick?? A english, RE, art and psychology! :) was unsure between history

and psych but found out bubbles was teaching it so nooo! :L Conversation c2 on October 12, 2011 A :) My day's been okay! It feels long! But school' was okayish. I

hope you have an awesome day! :D B that's good then! Ahh hope it's not cause anything bad happened?

Thanks! Have a great sleep :) A no! Class was just boring lol and thanks! :) i will! Even though i

have to wake up early tomorrow for a midterm! :S

<Topic words: party vote people politics obama>

<Topic words: school mate class teacher grade>

Thursday, December 6, 2012

Page 48: Topic and text analysis for sentiment, emotion, and computational social science

CAVEAT

45

Big Data, social media data, do not always get the right answers! They contain much noise and much bias. Sentiment analysis is also full of problems at the big data-level because every small assumption can turn out to cause wide swings in the final interpretation of the data. They are valuable because they have opened up possibilities for analyses of naturally-occurring data in huge amounts.We need better methods and tools that are tailored for social media.We need to ask the right questions that can be answered well despite the biases of the social media data.

Thursday, December 6, 2012

Page 49: Topic and text analysis for sentiment, emotion, and computational social science

For details, visit our webpage:http://uilab.kaist.ac.krOr email me:[email protected]

Thursday, December 6, 2012