Data Science Popup Austin: Using lda and Structural Topic Modeling to Explore Trending Topics in a...

Post on 16-Apr-2017

622 views 1 download

Transcript of Data Science Popup Austin: Using lda and Structural Topic Modeling to Explore Trending Topics in a...

DATA SCIENCEPOP UP

AUSTIN

Using LDA and Structural Topic Modeling to Explore Trending Topics in a Call Center

Jordana HellerData Scientist, Mattersight

jheller

DATA SCIENCEPOP UP

AUSTIN

#datapopupaustin

April 13, 2016Galvanize, Austin Campus

Lightning Talk: Using LDA and Structural Topic Modeling to Explore Trending Topics in a Call CenterJordana Heller @jhellerData Science Pop-up Austin, April 13, 2016

©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.

What We Do

©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.

Our goal: Topic Trends

3/31/2016 4/30/2016 5/31/2016 6/30/2016 7/31/2016

Identifying contents and prevalence of multiword topics present in conversation in an unsupervised way

Unexpected Prevalence Critical Spikes Escalating Frequency

©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.

Our goals, continued

Manageable number of topics

Track expected and unexpected topics

Go deep: Contextualize topic usage

©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.

Short text: Keywords, hashtags, ngrams

©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.

Long text: Could use predetermined topics

Image credit: IBM Watson Concept Insights

©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.

Long text: Or discover themes

Image credit: Blei, 2012, Communications of the ACM

Latent Dirichlet Allocation (LDA) (Blei et al., 2003)

©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.

Great! How about contextualizing trends?

• Where are topics trending?• Structural Topic Modeling (Roberts et al., 2013)

– Instead of relying on post-hoc comparisons, includes covariates in LDA model• Specifies priors as GLMs• Word distribution determined by topic, covariates,

topic-covariate interaction– Authors’ implementation: R package stm (available

via CRAN; all code on GitHub!)

Ready to talk pipeline!

©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.

Data Collection and Preprocessing

Read Transcripts

Add Call-level Covariates

Preprocess text

• Collocations• -Stop words• Stem/completion• -Low freq terms

Create Term-Document

Matrix

©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.

Topic Model Creation

Retrieve last topic

model

• For comparison

Create current

topic model

•Detect number of topics, or specify

Create topic labels

©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.

Topic Model Comparison

Inspect overall topic prevalence

Compare overall topic prevalence across periods

• Topics change! Measure change in word probability distributions for each new topic wrt each old topic

• Match new to closest previous match below change threshold (otherwise new topic)

• Evaluate trends!

Estimate and inspect effects of

covariates

Compare effects of covariates

across periods

•Output can be interpreted similarly to regression

Example results: Hotel reservations Covariates: booking, caller distress

©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.

Trend Contextualization: Booking

� convention, center, mind, worry, philadelphia, inventory� NewÄ Decreasingà Increasing

Hit: > 1% of words on call assigned to a given topic

©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.

Trend Contextualization: Booking

� school, college, graduate, medical, clinic

� NewÄ Decreasingà Increasing

Hit: > 1% of words on call assigned to a given topic

©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.

Trend Contextualization: Booking

Ã30% beach, balcony, ocean, view

� NewÄ Decreasingà Increasing

Hit: > 1% of words on call assigned to a given topic

©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.

Trend Contextualization: Booking

Ä10% back, next, receive, listen, cash future

� NewÄ Decreasingà Increasing

Hit: > 1% of words on call assigned to a given topic

©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.

Trend Contextualization: Booking

� back, minute, system, run, inconvenience

� NewÄ Decreasingà Increasing

Hit: > 1% of words on call assigned to a given topic

©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.

Trend Contextualization: Booking

Ã42% confirm, email, arrival, local

� NewÄ Decreasingà Increasing

Hit: > 1% of words on call assigned to a given topic

©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.

Trend Contextualization: Caller Distress

� NewÄ Decreasingà Increasing

Distress: > 30 seconds of linguistically-identified dissatisfaction or negative emotion

©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.

Trend Contextualization: Caller Distress

� square, city, price, hotel, manhattan, central

� NewÄ Decreasingà Increasing

Distress: > 30 seconds of linguistically-identified dissatisfaction or negative emotion

©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.

Trend Contextualization: Caller Distress

Ä12% online, website, cancel, purchase, advance� NewÄ Decreasingà Increasing

Distress: > 30 seconds of linguistically-identified dissatisfaction or negative emotion

Nice!

©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.

Our goals, revisited

Manageable number of topics

Track expected and unexpected topics

Go deep: Contextualize topic usage

©2016 Mattersight Corporation. Mattersight Restricted Confidential Information.

Topic trends using structural topic models

Thank you!

DATA SCIENCEPOP UP

AUSTIN

@datapopup #datapopupaustin