Swdm15

49
Classification Method for Shared Information on Twitter Without Text Data the University of Tokyo, Japan Seigo Baba Fujio Toriumi, Takeshi Sakaki, Kosuke Shinoda, Kazuhiro Kazama, Satoshi Kurihara, Itsuki Noda The 3rd International Workshop on Social Web for Disaster Managemen (SWDM'15) with WWW’15(May 2015, Florence, Italy) 1

Transcript of Swdm15

Page 1: Swdm15

Classification Method for Shared Information on

Twitter Without Text Data

the University of Tokyo, Japan

Seigo BabaFujio Toriumi, Takeshi Sakaki, Kosuke Shinoda, Kazuhiro Kazama, Satoshi Kurihara, Itsuki Noda

The 3rd International Workshop on Social Web for Disaster Management (SWDM'15)  with WWW’15(May 2015, Florence, Italy)

1

Page 2: Swdm15

Contents

• Introduction• Proposed tweet Clustering method• Subjective Experiments• Linguistic Similarities in Clusters• Conclusions

2

Page 3: Swdm15

Contents

• Introduction• Proposed tweet Clustering method• Subjective Experiments• Linguistic Similarities in Clusters• Conclusions

3

Page 4: Swdm15

Information in Disaster Situation

• Local information must be collected– For Victims

• Shelter location• Tsunami, ...

– For Rescuers• Donating money• Volunteer activities, ...

4

Page 5: Swdm15

How to collect information in disaster situation ?

• From mass media ? – General and public information only– Not personalized

• From social media ?– They perform well

• [10 Mendoza],[11 miyabe],[10 sakaki]

– In particular, Twitter is useful• We also focus on Twitter

5

Page 6: Swdm15

Classification of Tweets is required

6

A lot of Tweets  5,000 Tweets posted per sec in

the 2011 Great East Japan Earthquake   ( Official Twitter Blog — Japan )

Collecting appropriate Tweets is difficult

Classification of Tweets is required !

Page 7: Swdm15

Weakness in Classification using Text Mining

7

「 Shut off the gas 」「 My head hurts 」「Wear shoes 」「 Good morning ! 」「 A head office 」「 Protect your head 」

:

Group

Cluster①

 「My head hurts 」 「 A head office 」 「 Protect your head 」Cluster②

   :

• Are they topic similar?

Page 8: Swdm15

Focusing on Retweet

• RT(Retweet): Suggest a user has interest in a Tweet[13 Toriumi]

8

「 Shut off the gas 」「 My head hurts 」「Wear shoes 」「 Good morning ! 」「 A head office 」「 Protect your head 」

:

Group

Cluster①

 「 Shut off the gas 」 「Wear shoes 」 「 Protect your head 」Cluster②

   :

RT

RT

RT

Interest

Page 9: Swdm15

Purpose of this study

9

「 Shut off the gas 」「 My head hurts 」「Wear shoes 」「 Good morning ! 」「 A head office 」「 Protect your head 」

:

Group

Cluster①

 「 Shut off the gas 」 「Wear shoes 」 「 Protect your head 」Cluster②

   :

RT

RT

RT

Interest

Propose a novel tweet classification method focusing on retweets

Page 10: Swdm15

Contents

• Introduction• Proposed tweet Clustering method• Subjective Experiments• Linguistic Similarities in Clusters• Conclusions

10

Page 11: Swdm15

An outline of proposed method

11

Calculate the similarity between

tweets and

Construct retweet network

Network clustering

Tweet1 Tweet20.15

Tweet3

0.03

Page 12: Swdm15

The Similarity of Retweet Users

• Similar tweets are retweeted by similar users• Two tweets whose similarity of retweet users is high may

share a topic

– – Users who retweeted tweet ,

12・・・・

=Retweet

T1 T2 T3 T4 T5 ・・・・・・

T=Tweet

Page 13: Swdm15

The Similarity of Retweet Users

• Similar tweets are retweeted by similar users• Two tweets whose similarity of retweet users is high may

share a topic

– – Users who retweeted tweet ,

13・・・・

=Retweet

T1 T2 T3 T4 T5 ・・・・・・

T=Tweet

Page 14: Swdm15

Construct retweet network

• Connect two tweets which satisfy – =0.05– The similarities for all the combination of two

tweets were calculated– Nodes in obtained component may be topic

similar mutually

14

T1

T4T3

T20.06

0.1

0.030.11

0.02

0.01

T1

T4T3

T2=0.05

Page 15: Swdm15

Data

• Tweets retweeted more than 100 times from March 5 to 24, 2011 – The Great East Japan Earthquake occurred at

11th – 34,860 tweets

15

Page 16: Swdm15

Retweet network

16

Page 17: Swdm15

Network Clustering

• It is assumed that large component have various topics

• Apply clustering method based on Newman method [04 Newman] to retweet network– To extract clusters that contain similar tweets

17

Page 18: Swdm15

Clustering Result

• 11,494 Tweets→2,001 Clusters• Following slides show some clusters

18

Page 19: Swdm15

Result Example 1

• Cluster about shelter

19

The Oura cafeteria on the Ueno Campus of the Tokyo University of the Arts is open. You can

spend the night there.

[a quick report] Okumakodo is open! It looks like it has some blankets http://twitpic.com/48f6y2

Are you all right? [The Tokyo Bunka Kaikan just opened. It's getting dark and cold, so if you are

around Ueno Station, please go there.]

Page 20: Swdm15

Result Example 2

• Cluster about advice for victims

20

If you are evacuating with a baby, wrap the baby in a blanket and carry it in a tote bag. No baby

buggies! \#jishin

[Please spread] If you use Twitter by mobile phone, turn off your icons to conserve battery life.

Page 21: Swdm15

Contents

• Introduction• Proposed tweet Clustering method• Subjective Experiments• Linguistic Similarities in Clusters• Conclusions

21

Page 22: Swdm15

Proposed Method’s Validity

• Conduct subjective experiments to clarify the proposed method’s validity– Are tweets in same cluster similar to each

other ?• The Experiment consists of 2 choice

questions

22

Page 23: Swdm15

Example of a question in subjective experiment

23

Twitter is a source of information

Yahoo! Map shows the area of the rolling

blackouts

The site gives information about power plant and rolling blackouts

Which tweet is more topic-similar to me?

Choice Tweet A Choice Tweet B

Statement Tweet

Page 24: Swdm15

How to Make Questions ?

• Choice tweets consist of two tweets– Inner tweet

• Belongs to the cluster to which the statement tweet belongs

– Outer tweet• Belongs to the cluster to which the statement tweet

does not belong

24

Tweet

Tweet

Tweet

Tweet

ClusterTweet

Tweet

Tweet

ClusterInner Tweet

Statement Tweet

Outer Tweet

Page 25: Swdm15

How to Make Questions ?

25

Tweet

Tweet

Tweet

Tweet

Tweet

Tweet

Tweet

Tweet

Tweet

Tweet

Cluster Cluster

Tweet

Tweet

Tweet

Tweet

ClusterTweet

Tweet

Tweet

Tweet

Cluster

Tweet

• Two cluster are selected randomly

Tweet

Tweet

Tweet

Cluster

Page 26: Swdm15

How to Make Questions ?

26

Tweet

Tweet

Tweet

Tweet

Tweet

Tweet

Tweet

Tweet

Tweet

Tweet

Cluster Cluster

Tweet

Tweet

Tweet

Tweet

ClusterTweet

Tweet

Tweet

Tweet

Cluster

Tweet

• Two cluster are selected randomly

Tweet

Tweet

Tweet

Cluster

Page 27: Swdm15

How to Make Questions ?

27

Tweet

Tweet

Tweet

Tweet

Tweet

Tweet

Tweet

Tweet

Tweet

Tweet

Cluster Cluster

Tweet

Tweet

Tweet

Tweet

ClusterTweet

Tweet

Tweet

Tweet

Cluster

Tweet

• A statement Tweet is selected randomly

Tweet

Tweet

Tweet

ClusterStatement

Tweet

Page 28: Swdm15

How to Make Questions ?

28

Tweet

Tweet

Tweet

Tweet

Tweet

Tweet

Tweet

Tweet

Tweet

Tweet

Cluster

Tweet

Tweet

Tweet

Tweet

ClusterTweet

Tweet

Tweet

Tweet

Cluster

Tweet

• Inner Tweet and outer Tweet are selected randomly

Tweet

Tweet

Tweet

ClusterStatement

TweetInner Tweet Outer Tweet

Page 29: Swdm15

Example of a question in subjective experiment

29

Twitter is a source of information

Yahoo! Map shows the area of the rolling

blackouts

The site gives information about power plant and rolling blackouts

Which tweet is more topic-similar to me?

Choice Tweet A Choice Tweet B

Statement Tweet

Inner TweetOuter Tweet

Page 30: Swdm15

Examinees and Questions

• 100 questions were selected randomly– Each examinee solved 50 of them

• Fourteen Examinees– Seven examinees solved each question– If more than four examinees select a inner

tweet, the result is labeled as ‘Correct’.

30

50 Questions

50 Questions

Page 31: Swdm15

Subjective Experiment Result

• 89% of all the question were correct !

31

89% !

Page 32: Swdm15

Subjective Experiment Result

• The similarities are obvious– More than six examinees selected the inner

tweet in 77% of questions

32

77% !

Page 33: Swdm15

The validity was confirmed

• We confirmed the validity of the proposed method – The rate of the clusters whose nodes are

mutually similar in the cluster to all cluster is very high

– The similarities of the nodes in each cluster are obvious

33

Page 34: Swdm15

Contents

• Introduction• Proposed tweet Clustering method• Subjective Experiments• Linguistic Similarities in Clusters• Conclusions

34

Page 35: Swdm15

Can classification based on text mining group them?

35

This tweet was posted by a volunteer center. Yesterday, more than 1000 people read it and

learned about dangerous areas and shortages. What should we do? http://t.co/4JpWlXt \#jishin

RT [please spread] Check that your car has a jack for changing tires. They are useful for rescuing

victims from rubble. \#jishin \#jisin

• Some clusters have little linguistic similarities– Which are difficult to group by using text mining

Cluster about advice for victims

Page 36: Swdm15

Linguistic Similarities in Clusters

• The quantitative assessments of linguistic similarities is required

• Apply Vector Space Model– Calculate the linguistic similarity between two

document based on TF-IDF• In this Study, document = tweet

36

Page 37: Swdm15

Apply Vector Space Model

• Calculate linguistic similarities of two tweets for all the combination() – Including linked and unlinked combination– To calculate reference values

37

Page 38: Swdm15

Reference Values

• The result of calculation for all combination – Average = 0.0156

• When the similarity between two tweets is under that average(0.0156), their linguistic similarity is random at most

38

Page 39: Swdm15

Linguistic Similarities in each cluster

• The linguistic similarities in each cluster were also calculated– Defined as the average of the tweets for all

the combinations of the nodes that belong to the cluster

39

𝐶23❑Cluster

Tweet1 Tweet2

Tweet1

Tweet2 Tweet3

Tweet3

0.5

0.3

0.1

=0.3Tweet1

Tweet2

Tweet3

All combinations

in cluster

Linguistic similarity in cluster

Page 40: Swdm15

Linguistic Similarities in each cluster

• 8.25 % of all clusters are under 0.0156– Some of the clusters are as low as randomly

selected tweets– Which are difficult to group by using text

mining !

40

8.25%, 0.0156

Page 41: Swdm15

Example of Clusters with low linguistic similarities 1

• Cluster about life in shelter– Linguistic similarity is 0.0108

41

I've experienced two big earthquakes. I spent a few nights in a car and saw many senior citizens who seemed to be suffering from economy class syndrome from remaining in the same posture for a long time. If you have to spend too much

time in a car or a cramped shelter, don't forget to stretch your legs.

If children are shaking or suffering from fear, hug and comfort them.

Page 42: Swdm15

Example of Clusters with low linguistic similarities 2

• Cluster about advices for victims– Linguistic similarity is 0.0052

42

RT [Summarize the information]Open the door, Cook some rice, Place baggages in an entrance, Buy water, Snacks and a towel,

Blankets, Wear shoes ....

My friend who survived the Great Hanshin Earthquake evacuated his house in pajamas. So

tonight, sleep in clothes just case you have to leave quickly.

Page 43: Swdm15

Contents

• Introduction• Proposed tweet Clustering method• Subjective Experiments• Linguistic Similarities in Clusters• Conclusions

43

Page 44: Swdm15

Conclusions

• We proposed a novel method of the classification of tweets by focusing on retweets without using text mining

• Most of the obtained clusters have local information which are very useful in disaster situation

44

Page 45: Swdm15

Conclusions

• A subjective experiment confirmed the validity of our method – Nodes are similar to each other in 89 %

clusters– The similarities are obvious

• Clusters obtained by our method are topic-similar, even if they are not linguistically similar

45

Page 46: Swdm15

Future Works

• Apply a softClustering method to retweet network– Our proposed method is alternative classification– A tweet can’t belong to multi clusters

46

TsunamiShelter

Donating moneyVolunteer

Donating supplies

?

Information for victims

Information for rescuers

Page 47: Swdm15

Future Works

• Apply a softClustering method to retweet network– When softClustering is applied to retweet

network, a tweet can belong to multi clusters

47

Tsunami

Shelter

Donating money

Volunteer

Donating supplies

Information for victims

Information for rescuers

Page 48: Swdm15

Future Works

• Reduce the amount of calculations– Information must be provided quickly in

disaster situation

48

Page 49: Swdm15

Thank you!

• If you have good idea for our study, please mail me.

49

[email protected]