PET: A Statistical Model for Popular Events Tracking in Social Communities

22
PET: A Statistical Model for Popular Events Tracking in Social Communities Cindy Xide Lin 1 , Bo Zhao 1 , Qiaozhu Mei 2 , Jiawei Han 1 1 University of Illinois at Urbana-Champaign, 2 University of Michigan KDD 2010 2010. 09. 16. Summarized and Presented by Sang-il Song, IDS Lab., Seoul National University

description

PET: A Statistical Model for Popular Events Tracking in Social Communities. Cindy Xide Lin 1 , Bo Zhao 1 , Qiaozhu Mei 2 , Jiawei Han 1 1 University of Illinois at Urbana-Champaign, 2 University of Michigan KDD 2010 2010. 09. 16. - PowerPoint PPT Presentation

Transcript of PET: A Statistical Model for Popular Events Tracking in Social Communities

Page 1: PET: A Statistical Model for Popular Events Tracking in Social Communities

PET: A Statistical Model for Popular Events Tracking in Social Communi-

ties

Cindy Xide Lin1, Bo Zhao1, Qiaozhu Mei2, Jiawei Han1

1University of Illinois at Urbana-Champaign, 2University of Michigan

KDD 2010

2010. 09. 16.

Summarized and Presented by Sang-il Song, IDS Lab., Seoul National Uni-versity

Page 2: PET: A Statistical Model for Popular Events Tracking in Social Communities

Copyright 2010 by CEBT

Contents

Introduction

Concept Definition

Problem Definition

Model

Interest model

Topic model

Experiment

Data Collection

Baseline and Gold standard

Analysis on Popularity Trend

Analysis on Content Evolution

Conclusions & Discussions

2

Page 3: PET: A Statistical Model for Popular Events Tracking in Social Communities

Copyright 2010 by CEBT

Introduction

Boom of online communities

e.g., Facebook, Blogger, Twitter, …

Facilitates the information creation, sharing and diffusion.

Popular topic or event can spread much faster.

Needs to track the diffusion and evolution of a popular event

Hot topics emerge, prevail and die

It is desirable to monitor whether people like, what they like, and how their interests change over time

e.g., Who are still interested in watching Avatar 50 days af-ter its release date?

3

Page 4: PET: A Statistical Model for Popular Events Tracking in Social Communities

Copyright 2010 by CEBT

Introduction

Tracking the evolution of a popular topic is challenging

Diffusion of an event is vague

e.g., You don’t know whether I am interest in an event

e.g., and even if you do, from whom did I get this interest.

Fortunately, a large volume of text data is generated from the social communities.

Besides Communicating with friends, a web user also con-stantly generates text contents such as blog.

A network structure and a text collection which evolve si-multaneously and interrelatedly.

4

Page 5: PET: A Statistical Model for Popular Events Tracking in Social Communities

Copyright 2010 by CEBT

Goal

Tracking Popular Event in a time-variant social commu-nity

A stream of text information

A stream of network structures

Modeling the interest of user

Modeling the change of topic

5

Page 6: PET: A Statistical Model for Popular Events Tracking in Social Communities

Copyright 2010 by CEBT

Concept Definition: Network Stream

6

26

3

4 5

1

Gk: The snapshot of network at time tk

v1

v2

v3

v4

v5

v6

G = { G1, G2, …, Gn }

Page 7: PET: A Statistical Model for Popular Events Tracking in Social Communities

Copyright 2010 by CEBT

Concept Definition: Document Stream

Document Collection Stream D = {D1, D2, …, DT}

Documents collections Dk = {dk,1, dk,2, …., dk,N}

7

26

3

4 5

1

v1

v2

v3

v4

v5

v6

w1, w2w3, w1,…

dk,1

w2, w2w3, w1,…

dk,2

w4, w1w1, w1,…

dk,3

w2, w6w2, w5,…

dk,4

w7, w7w7, w7,…

dk,5

w8, w6w2, w5,…

dk,5

Page 8: PET: A Statistical Model for Popular Events Tracking in Social Communities

Copyright 2010 by CEBT

Concept Definition: Topic and Event

Topic

topic θ is a multinomial distribution of words {p(w|θ)}w∈W

Topic has different version over time, denoting the version at time tk as θk

Event

A stream of topics Theta E = {θ0E, θ1

E, θ2E, … θT

E}

θ0E is the primitive topic of the event

θkE corresponds to the version of θ0

E at time tk

– Indicates the major aspects of the event in network Gk

8

Page 9: PET: A Statistical Model for Popular Events Tracking in Social Communities

Copyright 2010 by CEBT

Concept Definition: Interest

Interest

hk(i): node vi in Gk has a certain level of interest in the par-

ticular event at time tk

Real value between 0 and 1

Hk = {hk(1), hk(2), …, hk(N)}

9

Page 10: PET: A Statistical Model for Popular Events Tracking in Social Communities

Copyright 2010 by CEBT

Problem: Popular Event Tracking

Inputs

Network Stream G

Document Stream D

Primitive topic of an event θ0

Task1: Popularity Tracking

Inferring the latent stream of interests. (Hk)

– providing much richer information about how the interest e

Task2: Topic Tracking

Inferring the latent stream of topics about the event ΘE

– Keeping track of the new development about the event,

– Understanding event evolution

10

Page 11: PET: A Statistical Model for Popular Events Tracking in Social Communities

Copyright 2010 by CEBT

Intuitions

Observation 1. Interest and Connections

The behavior of each individual is usually influenced by its friend.

Observation 2. Interest and History

The behavior of each individual should be generally consis-tent over time.

Events should not change dramatically.

Observation 3. Content and Interest

When an individual has a higher level of interest in an event, the content she generates should be more likely to be related to the event

11

Page 12: PET: A Statistical Model for Popular Events Tracking in Social Communities

Copyright 2010 by CEBT

The General Model

Current interest and topic depends on

Current network

Current Documents

Previous history (Markovian simplification)

Formal representation

P(Hk, Θk | Gk, Dk, Hk-1)

12

Page 13: PET: A Statistical Model for Popular Events Tracking in Social Communities

Copyright 2010 by CEBT

Assumption

13

How to model P(Hk, Θk | Gk, Dk, Hk-1) ?

Assumption 1.

Given current network structure Gk and previous Hk-1,

Current interest status Hk is independent of the document collec-

tion Dk

Hk ㅛ Dk | Gk, Hk-1

People first become interested in the event and therefore generate discussion it

Assumption 2.

Given the current interest status Hk and the document collection Dk,

The current topic model k is independent of Gk and Hk-1

θk ㅛ Gk, Hk-1| Hk, Dk

Once the author has developed an interest in the event, the con-tents she writes will only depend on the event itself and the level of interest

P( Hk, Θk | Gk, Dk, Hk-1 ) = P(Hk | Gk, Hk-1) P(Θk|Hk, Dk)

Page 14: PET: A Statistical Model for Popular Events Tracking in Social Communities

Copyright 2010 by CEBT

Interest Model

Gibbs Random field

Great use in studying natural processes

(Gibbs distribution)

cf. (Gaussian distribution is a special member of Gibbs dis-tribution family)

P (Hk | Gk, Hk-1)

h’(k) is weighted sum of friends’ interest

The first part is transition energy of node i

The last part represents neighbors expectation

14

0.20.3

10.2

0.80.1

h’=1*0.2+0.3*0.8+0.2*0.1 = 0.46

Page 15: PET: A Statistical Model for Popular Events Tracking in Social Communities

Copyright 2010 by CEBT

Topic Model

Considering each document is generated two multino-mial component model

Background model: θkB

– Modeling Common words

Latent event topic model: θkE

– Modeling discriminative and meaningful words

The probability of generating word

P(Θk|Hk, Dk)

15

Page 16: PET: A Statistical Model for Popular Events Tracking in Social Communities

Copyright 2010 by CEBT

Twitter Data collection

Selecting 5000 users with follower-followee relationship

Considering each day as a time point (tk: the kth day)

Document dk,i is obtained by concatenating tweets dis-

played by user i in k

weight of relationship between user equals the number of tweets displayed by user I by following user j during the pe-riod from tk-30 to tk.

16

Page 17: PET: A Statistical Model for Popular Events Tracking in Social Communities

Copyright 2010 by CEBT

Baseline and Gold standard

BOM: extracting the daily box office at Mojo

The box office earning is a trustworthy criterion to reflect the movie’s popularity

GInt: Google Insight

PET

PET- : special version of PET by removing network struc-ture

JonK / Cont17

Page 18: PET: A Statistical Model for Popular Events Tracking in Social Communities

Copyright 2010 by CEBT

Analysis on Popularity Trend

18

Page 19: PET: A Statistical Model for Popular Events Tracking in Social Communities

Copyright 2010 by CEBT

Analysis on Popularity Trend

19

Page 20: PET: A Statistical Model for Popular Events Tracking in Social Communities

Copyright 2010 by CEBT

Analysis on Popularity Trend

PET always has the best performance

Historic, textual and structured information is reflected well

PET- can not response sufficiently to sudden changes

20

Page 21: PET: A Statistical Model for Popular Events Tracking in Social Communities

Copyright 2010 by CEBT

Analysis on Content Evolution

21

Page 22: PET: A Statistical Model for Popular Events Tracking in Social Communities

Copyright 2010 by CEBT

Conclusion & Discussion

Propose the novel problem of Popular Event Tracking

Propose popular event tracking model, PET

Unified probabilistic framework to model different factors

Covers classical models

Experimental studies show that PET outperforms existing ones

PET is not good framework for tracking interest

There exist the more accurate data such as Google Insight.

Tracking topic changing is a novel problem.

PET detects and tracks topic evolution well.

22