Swdm15

Classification Method for Shared Information on

Twitter Without Text Data

the University of Tokyo, Japan

Seigo BabaFujio Toriumi, Takeshi Sakaki, Kosuke Shinoda, Kazuhiro Kazama, Satoshi Kurihara, Itsuki Noda

The 3rd International Workshop on Social Web for Disaster Management (SWDM'15) with WWW’15(May 2015, Florence, Italy)

1

Contents

• Introduction• Proposed tweet Clustering method• Subjective Experiments• Linguistic Similarities in Clusters• Conclusions

2

Contents


3

Information in Disaster Situation

• Local information must be collected– For Victims

• Shelter location• Tsunami, ...

– For Rescuers• Donating money• Volunteer activities, ...

4

How to collect information in disaster situation ?

• From mass media ? – General and public information only– Not personalized

• From social media ?– They perform well

• [10 Mendoza],[11 miyabe],[10 sakaki]

– In particular, Twitter is useful• We also focus on Twitter

5

Classification of Tweets is required

6

A lot of Tweets　 5,000 Tweets posted per sec in

the 2011 Great East Japan Earthquake　　　（ Official Twitter Blog — Japan ）

Collecting appropriate Tweets is difficult

Classification of Tweets is required !

Weakness in Classification using Text Mining

7

｢ Shut off the gas ｣「 My head hurts 」｢Wear shoes ｣「 Good morning ! 」「 A head office 」｢ Protect your head ｣

:

Group

Cluster①

　｢My head hurts ｣　「 A head office 」　「 Protect your head 」Cluster②

　　　：

• Are they topic similar?

Focusing on Retweet

• RT(Retweet): Suggest a user has interest in a Tweet[13 Toriumi]

8


:

Group

Cluster①

　｢ Shut off the gas ｣　｢Wear shoes ｣　｢ Protect your head ｣Cluster②

　　　：

RT

RT

RT

Interest

Purpose of this study

9


:

Group

Cluster①

　｢ Shut off the gas ｣　｢Wear shoes ｣　｢ Protect your head ｣Cluster②

　　　：

RT

RT

RT

Interest

Propose a novel tweet classification method focusing on retweets

Contents


10

An outline of proposed method

11

Calculate the similarity between

tweets and

Construct retweet network

Network clustering

Tweet1 Tweet20.15

Tweet3

0.03

The Similarity of Retweet Users

• Similar tweets are retweeted by similar users• Two tweets whose similarity of retweet users is high may

share a topic

– – Users who retweeted tweet ,

12・・・・

＝Retweet

T1 T2 T3 T4 T5 ・・・・・・

T=Tweet

The Similarity of Retweet Users

• Similar tweets are retweeted by similar users• Two tweets whose similarity of retweet users is high may

share a topic

– – Users who retweeted tweet ,

13・・・・

＝Retweet

T1 T2 T3 T4 T5 ・・・・・・

T=Tweet

Construct retweet network

• Connect two tweets which satisfy – =0.05– The similarities for all the combination of two

tweets were calculated– Nodes in obtained component may be topic

similar mutually

14

T1

T4T3

T20.06

0.1

0.030.11

0.02

0.01

T1

T4T3

T2=0.05

Data

• Tweets retweeted more than 100 times from March 5 to 24, 2011 – The Great East Japan Earthquake occurred at

11th – 34,860 tweets

15

Retweet network

16

Network Clustering

• It is assumed that large component have various topics

• Apply clustering method based on Newman method [04 Newman] to retweet network– To extract clusters that contain similar tweets

17

Clustering Result

• 11,494 Tweets→2,001 Clusters• Following slides show some clusters

18

Result Example 1

• Cluster about shelter

19

The Oura cafeteria on the Ueno Campus of the Tokyo University of the Arts is open. You can

spend the night there.

[a quick report] Okumakodo is open! It looks like it has some blankets http://twitpic.com/48f6y2

Are you all right? [The Tokyo Bunka Kaikan just opened. It's getting dark and cold, so if you are

around Ueno Station, please go there.]

Result Example 2

• Cluster about advice for victims

20

If you are evacuating with a baby, wrap the baby in a blanket and carry it in a tote bag. No baby

buggies! \#jishin

[Please spread] If you use Twitter by mobile phone, turn off your icons to conserve battery life.

Contents


21

Proposed Method’s Validity

• Conduct subjective experiments to clarify the proposed method’s validity– Are tweets in same cluster similar to each

other ?• The Experiment consists of 2 choice

questions

22

Example of a question in subjective experiment

23

Twitter is a source of information

Yahoo! Map shows the area of the rolling

blackouts

The site gives information about power plant and rolling blackouts

Which tweet is more topic-similar to me?

Choice Tweet A Choice Tweet B

Statement Tweet

How to Make Questions ?

• Choice tweets consist of two tweets– Inner tweet

• Belongs to the cluster to which the statement tweet belongs

– Outer tweet• Belongs to the cluster to which the statement tweet

does not belong

24

Tweet

Tweet

Tweet

Tweet

ClusterTweet

Tweet

Tweet

ClusterInner Tweet

Statement Tweet

Outer Tweet


25

Tweet

Tweet

Tweet

Tweet

Tweet

Tweet

Tweet

Tweet

Tweet

Tweet

Cluster Cluster

Tweet

Tweet

Tweet

Tweet

ClusterTweet

Tweet

Tweet

Tweet

Cluster

Tweet

• Two cluster are selected randomly

Tweet

Tweet

Tweet

Cluster


26

Tweet

Tweet

Tweet

Tweet

Tweet

Tweet

Tweet

Tweet

Tweet

Tweet

Cluster Cluster

Tweet

Tweet

Tweet

Tweet

ClusterTweet

Tweet

Tweet

Tweet

Cluster

Tweet

• Two cluster are selected randomly

Tweet

Tweet

Tweet

Cluster


27

Tweet

Tweet

Tweet

Tweet

Tweet

Tweet

Tweet

Tweet

Tweet

Tweet

Cluster Cluster

Tweet

Tweet

Tweet

Tweet

ClusterTweet

Tweet

Tweet

Tweet

Cluster

Tweet

• A statement Tweet is selected randomly

Tweet

Tweet

Tweet

ClusterStatement

Tweet


28

Tweet

Tweet

Tweet

Tweet

Tweet

Tweet

Tweet

Tweet

Tweet

Tweet

Cluster

Tweet

Tweet

Tweet

Tweet

ClusterTweet

Tweet

Tweet

Tweet

Cluster

Tweet

• Inner Tweet and outer Tweet are selected randomly

Tweet

Tweet

Tweet

ClusterStatement

TweetInner Tweet Outer Tweet

Example of a question in subjective experiment

29

Twitter is a source of information

Yahoo! Map shows the area of the rolling

blackouts

The site gives information about power plant and rolling blackouts

Which tweet is more topic-similar to me?

Choice Tweet A Choice Tweet B

Statement Tweet

Inner TweetOuter Tweet

Examinees and Questions

• 100 questions were selected randomly– Each examinee solved 50 of them

• Fourteen Examinees– Seven examinees solved each question– If more than four examinees select a inner

tweet, the result is labeled as ‘Correct’.

30

50 Questions

50 Questions

Subjective Experiment Result

• 89% of all the question were correct !

31

89% !

Subjective Experiment Result

• The similarities are obvious– More than six examinees selected the inner

tweet in 77% of questions

32

77% !

The validity was confirmed

• We confirmed the validity of the proposed method – The rate of the clusters whose nodes are

mutually similar in the cluster to all cluster is very high

– The similarities of the nodes in each cluster are obvious

33

Contents


34

Can classification based on text mining group them?

35

This tweet was posted by a volunteer center. Yesterday, more than 1000 people read it and

learned about dangerous areas and shortages. What should we do? http://t.co/4JpWlXt \#jishin

RT [please spread] Check that your car has a jack for changing tires. They are useful for rescuing

victims from rubble. \#jishin \#jisin

• Some clusters have little linguistic similarities– Which are difficult to group by using text mining

Cluster about advice for victims

Linguistic Similarities in Clusters

• The quantitative assessments of linguistic similarities is required

• Apply Vector Space Model– Calculate the linguistic similarity between two

document based on TF-IDF• In this Study, document = tweet

36

Apply Vector Space Model

• Calculate linguistic similarities of two tweets for all the combination() – Including linked and unlinked combination– To calculate reference values

37

Reference Values

• The result of calculation for all combination – Average = 0.0156

• When the similarity between two tweets is under that average(0.0156), their linguistic similarity is random at most

38

Linguistic Similarities in each cluster

• The linguistic similarities in each cluster were also calculated– Defined as the average of the tweets for all

the combinations of the nodes that belong to the cluster

39

𝐶23❑Cluster

Tweet1 Tweet2

Tweet1

Tweet2 Tweet3

Tweet3

0.5

0.3

0.1

=0.3Tweet1

Tweet2

Tweet3

All combinations

in cluster

Linguistic similarity in cluster

Linguistic Similarities in each cluster

• 8.25 % of all clusters are under 0.0156– Some of the clusters are as low as randomly

selected tweets– Which are difficult to group by using text

mining !

40

8.25%, 0.0156

Example of Clusters with low linguistic similarities 1

• Cluster about life in shelter– Linguistic similarity is 0.0108

41

I've experienced two big earthquakes. I spent a few nights in a car and saw many senior citizens who seemed to be suffering from economy class syndrome from remaining in the same posture for a long time. If you have to spend too much

time in a car or a cramped shelter, don't forget to stretch your legs.

If children are shaking or suffering from fear, hug and comfort them.

Example of Clusters with low linguistic similarities 2

• Cluster about advices for victims– Linguistic similarity is 0.0052

42

RT [Summarize the information]Open the door, Cook some rice, Place baggages in an entrance, Buy water, Snacks and a towel,

Blankets, Wear shoes ....

My friend who survived the Great Hanshin Earthquake evacuated his house in pajamas. So

tonight, sleep in clothes just case you have to leave quickly.

Contents


43

Conclusions

• We proposed a novel method of the classification of tweets by focusing on retweets without using text mining

• Most of the obtained clusters have local information which are very useful in disaster situation

44

Conclusions

• A subjective experiment confirmed the validity of our method – Nodes are similar to each other in 89 %

clusters– The similarities are obvious

• Clusters obtained by our method are topic-similar, even if they are not linguistically similar

45

Future Works

• Apply a softClustering method to retweet network– Our proposed method is alternative classification– A tweet can’t belong to multi clusters

46

TsunamiShelter

Donating moneyVolunteer

Donating supplies

?

Information for victims

Information for rescuers

Future Works

• Apply a softClustering method to retweet network– When softClustering is applied to retweet

network, a tweet can belong to multi clusters

47

Tsunami

Shelter

Donating money

Volunteer

Donating supplies

Information for victims

Information for rescuers

Future Works

• Reduce the amount of calculations– Information must be provided quickly in

disaster situation

48

Thank you!

• If you have good idea for our study, please mail me.

49

[email protected]

Swdm15

Social Media

Transcript of Swdm15