Bursty Event Detection from Text Streams for Disaster Management

14
Bursty Event Detection from Text Streams for Disaster Management 2012-04-17 Sungjun Lee, Sangjin Lee, Kwanho Kim , and Jonghun Park [email protected] Information Management Lab. Dept. of Industrial Engineering Seoul National University

description

Bursty Event Detection from Text Streams for Disaster Management. 2012-04-17 Sungjun Lee, Sangjin Lee, Kwanho Kim , and Jonghun Park [email protected] Information Management Lab. Dept. of Industrial Engineering Seoul National University. Introduction. - PowerPoint PPT Presentation

Transcript of Bursty Event Detection from Text Streams for Disaster Management

Page 1: Bursty Event Detection from Text Streams for Disaster Management

Bursty Event Detection from Text Streams for Disaster Management

2012-04-17

Sungjun Lee, Sangjin Lee, Kwanho Kim, and Jonghun [email protected]

Information Management Lab.Dept. of Industrial Engineering

Seoul National University

Page 2: Bursty Event Detection from Text Streams for Disaster Management

2

Introduction

Identify disaster related bursty events from multiple text streams. Characterize bursty terms in terms of

- Skewness, consistency, periodicity, and variation.

normal disaster

Stream 1 Stream 2 Stream K…

Real worldstates

Streams

Observations happy goodnice

fine baddiecatastrophe

nightmare

Normal terms Disaster related terms

Scoring a term to determine whether or not it is bursty term.

Page 3: Bursty Event Detection from Text Streams for Disaster Management

3

Motivation example

The distribution of the frequency of terms observed in AP news stream on Feb. 27, 2010 and Mar. 1, 2010.

On Mar. 1, 2010, the trial about a Bosnian politician, Radovan Karadzic, began.

0

1

2x 10

4weighted TF. day: "27-Feb-2010"

chile

tsunam

i

earth

quak

saturda

i

struck

wave

magnitu

d eta

sund

aiqu

akpa

cifha

iti

berlu

scon

iwood

francmete

rsto

rmslain

massiv

zeala

ndtrig

ger

coas

tsilv

iotria

l

sepa

ratis

t

jerusa

lemsitepo

is

kaza

khsta

n

thaks

inoff

ici

basq

uar

restkil

lerjap

anita

lian

bomber

fridai

santi

ago

prog

ram

nucle

ar300ho

liira

noc

ean

hawaii al

grou

pvis

itorrelie

f

0

20

40TF. day: "27-Feb-2010"

chile

earth

quak

tsunam

ikil

l

sund

aiap

saturda

ifra

ncqu

akpe

oplne

wlea

deriranwave hit

offici

gove

rn

struckye

arwarn

arres

tetapa

ciflast

japangr

oupco

astda

ipo

liclea

st

coun

tri

nucle

arweek

magnitu

dpr

esid al

show

massivfre

nch

russ

iafirstele

ctworld sa

i

damag

rulesto

rmna

tionarmfor

c

0

5000

10000weighted TF. day: "01-Mar-2010"

bosn

ianchile

mondai

olymp

war

medve

devoff

ici

presid

earth

quakar

tist

falkla

nd

span

ishsto

rm

colom

bian

hama

mexico

karad

z

journ

alistdu

bai

court

serbi

alab

ourfine

vene

zuelaplo

tde

athserbba

squ sa

i 27cri

mewinter

europloo

tfes

tivke

nya

russia ira

n

coun

tri

arres

ttur

kei

mediat

medal

dispu

t

rado

vanca

rtel

bosn

ia

assa

ssinwrite

rele

ct

0

20

40TF. day: "01-Mar-2010"

mondai

pres

id

coun

tri

gove

rn apsta

te killne

wch

ilewar

bosn

ian saiira

noff

ici

natio

nlea

derworld

arres

twill

russ

iaye

arpe

opl

china

tuesd

aico

urt

autho

r

earth

quakeu

ropleast

susp

ectca

ll

amer

icanrepo

rt

olymplon

donpo

lic

muslimbr

itain

medve

dev

attac

k

karad

z

vene

zuelalas

tvis

it hit pari

hama

clinton

publi

cqu

ak

On Feb. 27, 2010, earthquake hit Chile.

Page 4: Bursty Event Detection from Text Streams for Disaster Management

4

Skewness feature

A bursty term appears intensively in a specific time period during the corresponding event occurs.

Term frequency during L days

Prob

abili

ty

Term frequency during L days

Prob

abili

ty

The change of the term frequency distribution of “tsunami”

𝑠𝑘𝑒𝑤 (𝑡 )=𝐸 ( 𝑋 (𝑡 )−𝜇 (𝑋 (𝑡)) )3

𝜎 (𝑋 (𝑡))3where .

Page 5: Bursty Event Detection from Text Streams for Disaster Management

5

Consistency feature

The frequency of a bursty term soars across multiple streams.

Stream 1

Stream 2

Stream K

The change of the term appearance of “tsunami”

Twitterer focusing on

tsunami research

Twitterer focusing on travel

Article not containing “tsunami”

Articlecontaining “tsunami”

𝑐𝑜𝑛𝑠 (𝑡 )=∑𝑐∈𝐶 √∑𝑗=1

𝐿

(𝑡𝑓 𝑐 ,𝑡 , 𝑗− ∑𝑐 ′∈𝐶

𝑡𝑓 𝑐 ′ ,𝑡 , 𝑗 /|𝐶|)2

Page 6: Bursty Event Detection from Text Streams for Disaster Management

6

Periodicity feature

Periodic terms are less likely to be bursty terms. Penalize terms exhibiting the periodicity.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.01

0.02

0.03

0.04

0.05

0.06

0.07Term "sundai"

Pow

er D

ensi

ty

normalized frequency0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

0.05Term "earthquak"

Pow

er D

ensi

ty

normalized frequency

period=6.8966

period=3.4843

Periodicity of “Sunday” Periodicity of “earthquake”

𝑝𝑒𝑟𝑖 (𝑡 )={𝑝 𝑖𝑓 𝑡𝑒𝑟𝑚𝑡 𝑖𝑠 𝑝𝑒𝑟𝑖𝑜𝑑𝑖𝑐1 𝑜 .𝑤 .

Page 7: Bursty Event Detection from Text Streams for Disaster Management

7

Variation feature

To cope with different writing styles among streams. Reduce the possibility of identifying a term with high frequency

only in a specific stream as a bursty term.

𝑣𝑎𝑟𝑖 (𝑐 , 𝑡 )=𝛼+𝜎 𝑐 (𝑐 , 𝑡 )𝜇𝑐(𝑐 , 𝑡 )

Stream 1

Stream 2

Stream K

The change of the term appearance of “AP”

AP news

Article not containing “AP”

Articlecontaining “AP”

Start to publish articles

with a fixed signature “AP news”

Page 8: Bursty Event Detection from Text Streams for Disaster Management

8

Putting them all together to measure burstyness

Combine the four scores of different features based on different rationale and scales.

The final term weighting scheme, burst, as follows:

𝑏𝑢𝑟𝑠𝑡 (𝑡 ,𝑑 )=𝑠𝑘𝑒𝑤 (𝑡)𝑘1×𝑐𝑜𝑛𝑠 (𝑡)𝑘2×𝑝𝑒𝑟𝑖 (𝑡)𝑘3×∑𝑐∈𝐶

{𝑣𝑎𝑟𝑖 (𝑐 , 𝑡 )×𝑛𝑓 (𝑐 , 𝑡 ,𝑑 )}where

Page 9: Bursty Event Detection from Text Streams for Disaster Management

9

Experiment setting

6 news channels are collected- Sources: CNN, AP, Reuters, Times Online,

Wall Street Journal, New York Times- Category: World news- Period: 1 Oct. 2009 – 15 Mar. 2010- Source Type: RSS feed

GoogleReaderRepositor

y

Data channels

Google reader API Experimen

tDB

Page 10: Bursty Event Detection from Text Streams for Disaster Management

10

Example of bursty terms

1 A strong aftershock to Chile's deadly earthquake provoked a brief panic in the city of Concepcion, but no tsunami warning was issued and no injuries or damage have been reported....

2Tsunami waves of up to 1.5 meters (5 feet) hit far-flung Pacific regions from the Russian far east and Japan to New Zealand's Chatham Islands on Sunday after a powerful earthquake struck Chile, but there were no reports of in-juries or serious damage.

3 Former member of the Bosnian wartime presidency Ejup Ganic was arrested at London's Heathrow airport on Monday on behalf of Serbian authorities, British police said.

4 A tsunami generated by a 8.8 magnitude earthquake in Chile hit beaches in eastern Australia on Sunday, witnesses and officials said, but there were no initial reports of damage.

5British police arrested a former senior Bosnian leader in London Monday on a Serbian warrant alleging he com-mitted war crimes, to the outrage of Bosnian leaders who said the move undermined Bosnian sovereignty....

Page 11: Bursty Event Detection from Text Streams for Disaster Management

11

Experiment results

Comparison of bursty term detection results with methods proposed by Whitney et al. (2009), Fung et al. (2005), Chen et al. (2007), and He et al. (2005).

Bold terms: bursty terms assumed to be correct. Underlined terms: topical terms. Starred terms: general terms.

Page 12: Bursty Event Detection from Text Streams for Disaster Management

12

Experiment results

Comparison of the performance of retrieving documents related with bursty events.

Page 13: Bursty Event Detection from Text Streams for Disaster Management

13

Further work

Chi-Square

MICV

KL Divergence

Skewness

Self-Similarity

Chernoff Divergence

Union of “Statistically Sufficient” Conditions

Bursty terms

Page 14: Bursty Event Detection from Text Streams for Disaster Management

14

Conclusion

Focus on identifying bursty terms to detect disaster related bursty events.

Bursty terms can help people in properly reacting in decision critical situations.

Bursty terms can be characterized by using four perspectives.- Skewness, consistency, periodicity, and variation.

The final scoring function to detect bursty terms is proposed. The experiment results showed that the proposed approach is

effective to detect bursty terms compared to the existing alternatives.