Finding bursty topics from microblogs

43
FINDING BURSTY TOPICS FROM MICROBLOGS Qiming Diao, Jing Jiang, Feida Zhu, Ee- Peng Lim Living Analytics Research Centre School of Information Systems Singapore Management University

description

 

Transcript of Finding bursty topics from microblogs

Page 1: Finding bursty topics from microblogs

FINDING BURSTY TOPICS FROM MICROBLOGS

Qiming Diao, Jing Jiang, Feida Zhu, Ee-Peng Lim

Living Analytics Research CentreSchool of Information SystemsSingapore Management University

Page 2: Finding bursty topics from microblogs

Abstract

To find topics that have bursty patterns on microblogs

two observations: 1. posts published around the same time

are more likely to have the same topic2. posts published by the same user are

more likely to have the same topic

Page 3: Finding bursty topics from microblogs

Introduction

Retrospective bursty event detection : Bursty detection: state machine Topic discovery: LDA

Two assumptions:1. If a post is about a global event, it is likely

to follow a global topic distribution that is time-dependent.

2. If a post is about a personal topic, it is likelyto follow a personal topic distribution that is more or less stable overtime.

Page 4: Finding bursty topics from microblogs

Method

Preliminaries d i , u i , t i , w i,j a bursty topic b as a word distribution

coupled with a bursty interval, denoted as ( ϕb,tb

s ,tbe )

Our task: to find meaningful bursty topics from the input text stream.

Our method: a topic discovery step and a burst detection step.

Page 5: Finding bursty topics from microblogs

Our Topic Model

Assume:1. C (latent) topics in the text stream,

where each topic c has a word distribution ϕc.

2. A background word distribution ϕB 3. A single post is most likely to be about

a single topic.4. A global topic distribution θt for each

time point t .

Page 6: Finding bursty topics from microblogs

Our focus is to find popular global events, we need to separate out these “personal” posts.

A time-independent topic distribution ηu for each user to capture her long term topical interests.

Page 7: Finding bursty topics from microblogs
Page 8: Finding bursty topics from microblogs
Page 9: Finding bursty topics from microblogs

Learning

Gibbs sampling :

M(0) ,M(1) , M(.)

M(c) , M(.)

M(c) , M(.)

E(v) , E(.)

M(v) , M(.)

Page 10: Finding bursty topics from microblogs

Learning

M(wi,j) , M(wi,j) , M(.)

Page 11: Finding bursty topics from microblogs

Burst Detection

Assume: A series of counts( mc1 , mc2 ,..., mcT)

representing the intensity of the topic at different time points.

These counts are generated by two Poisson distributions corresponding to a bursty state and a normal state.

Page 12: Finding bursty topics from microblogs

Burst Detection

σ 0 = 0 . 9 and σ 1 =0 . 6 for all topics.

Finally, a burst is marked by a consecutive subsequence of bursty states.

Page 13: Finding bursty topics from microblogs

Experiments

Data Set sampled 2892 users from this dataset and

extracted their tweets between September 1 and November 30, 2011(91 days in total).

the final dataset with 3,967,927 tweets and24,280,638 tokens.

Page 14: Finding bursty topics from microblogs

Ground Truth Generation top-30 bursty topics from each model two human judges to judge their quality by

assigning a score of either 0 or 1 Evaluation

We set the number of topics C to 80, α to 50/C and β to 0.01. Each model was run for 500 iterations of Gibbs sampling.

Page 15: Finding bursty topics from microblogs
Page 16: Finding bursty topics from microblogs

Sample Results and Discussions

Page 17: Finding bursty topics from microblogs

Sample Results and Discussions

Page 18: Finding bursty topics from microblogs

two case studies to demonstratethe effectiveness of our model

Effectiveness of Temporal Models: BothTimeLDA and TimeUserLDA tend to group posts published on the same day into the same topic.

Page 19: Finding bursty topics from microblogs

two case studies to demonstratethe effectiveness of our model

Effectiveness of User Models: it is important to filter out users’ “personal” posts in order to find meaningful global events.

Page 20: Finding bursty topics from microblogs

Conclusions

A new topic model that considers both thetemporal information of microblog posts and users’ personal interests.

A Poisson-based state machine to identify bursty periods from the topics discovered by our model.

Page 21: Finding bursty topics from microblogs

TM-LDA: EFFICIENT ONLINE MODELING OF THE LATENT TOPIC TRANSITIONS IN SOCIAL MEDIA

Page 22: Finding bursty topics from microblogs

ABSTRACT

TM-LDA learns the transition parameters among topics by minimizing the prediction error on topic distribution in subsequent postings.

We develop an efficient updating algorithm to adjust transition parameters, as new documents stream in.

Page 23: Finding bursty topics from microblogs

Challenges:1. to model and analyze latent topics in

social textual data;2. to adaptively update the models as the

massive social content streams in;3. to facilitate temporal-aware applications

of social media

Page 24: Finding bursty topics from microblogs

contribution

First, we propose a novel temporally-aware topic language model, TM-LDA, which captures the latent topic transitions in temporally-sequenced documents.

Second, we design an efficient algorithm to update TM-LDA which enables it to be performed on large scale data.

Finally, we evaluate TM-LDA against the static topic modeling method(LDA)

Page 25: Finding bursty topics from microblogs

METHODOLOGY

TM-LDA Algorithm if we define the space of topic distribution

as X = { x ∈ Rn+ : || x || 1 = 1 } , TM-LDA can be considered as a function f : X → X .

the prediction error

TM-LDA is modeled as a non-linear mapping:

Page 26: Finding bursty topics from microblogs

Error Function of TM-LDA:

Page 27: Finding bursty topics from microblogs

Iterative Minimization of the Error Function

Page 28: Finding bursty topics from microblogs

Direct Minimization of the Error Function

Page 29: Finding bursty topics from microblogs
Page 30: Finding bursty topics from microblogs

TM-LDA for Twitter Stream

Page 31: Finding bursty topics from microblogs

TM-LDA for Twitter Stream

let A = D (1 ;m ) and B = D (2 ;m +1)

Page 32: Finding bursty topics from microblogs

UPDATING TRANSITION PARAMETERS Updating Transition Parameters with

Sherman-Morrison-Woodbury Formula

Page 33: Finding bursty topics from microblogs

Updating Transition Parameters with QR-factorization

Suppose the QR-factorization of matrix A is A = QR , where Q′Q = I and R is an upper triangularmatrix. RT=Q’B

Page 34: Finding bursty topics from microblogs

EXPERIMENTS

Dataset

Using Perplexity as Evaluation Metric

Page 35: Finding bursty topics from microblogs

Predicting Future Tweets

TM-LDA first trains LDA on 7-day historical tweets and compute the transition parameter matrix accordingly. Then for each new tweet generated on the 8th day, it predicts the topic distribution of the following tweet.

Page 36: Finding bursty topics from microblogs

Estimated Topic Distributions of\Future" Tweets : the topic distribution of the tweet b.

LDA Topic Distributions of \Future" Tweets :the inferred topic distribution of the tweet b .

LDA Topic Distributions of\Previous" Tweets :the inferred topic distribution of the tweet a .

Page 37: Finding bursty topics from microblogs

Efficiency of Updating Transition Parameters

Page 38: Finding bursty topics from microblogs

Properties of Transition Parameters

T is a square matrix where the size of T is determined by the number of topics trained in LDA.

The row sum of T is always 1, which means that the overall weights emitted from atopicis 1.

Page 39: Finding bursty topics from microblogs

APPLYING TM-LDA FORTREND ANAL-YSIS AND SENSEMAKING

Page 40: Finding bursty topics from microblogs
Page 41: Finding bursty topics from microblogs

Changing Topic Transitions over Time

Page 42: Finding bursty topics from microblogs

Various Topic Transition Patterns by Cities

Page 43: Finding bursty topics from microblogs

CONCLUSIONS

a novel temporally-aware language model, TM-LDA, for efficiently modeling streams ofsocial text such as a Twitter stream for an author

an efficient model updating algorithm for TM-LDA