Celi @Clic2014: Geometric and Statistical Analysis of Topic and Emotions in Corpora

17
Geometric and Statistical Analysis of Topics and Emotions in Corpora Francesco Tarasconi - [email protected] Vittorio Di Tomaso - [email protected] Pisa, 9/12/2014

description

La nostra seconda presentazione al CLIC 2014: "Geometric and Statistical Analysis of Topic and Emotions in Corpora", con cui Francesco Tarasconi ha vinto l'attestato di Distinguished Young Paper, dato agli 8 migliori papers del convegno con un autore giovane.

Transcript of Celi @Clic2014: Geometric and Statistical Analysis of Topic and Emotions in Corpora

Page 1: Celi @Clic2014: Geometric and Statistical Analysis of Topic and Emotions in Corpora

Geometric and Statistical Analysis of Topics and Emotions in Corpora Francesco Tarasconi - [email protected] Vittorio Di Tomaso - [email protected]

Pisa, 9/12/2014

Page 2: Celi @Clic2014: Geometric and Statistical Analysis of Topic and Emotions in Corpora

Introduction: Analysis of Emotions

Francesco Tarasconi and Vittorio Di Tomaso 2

NLP: Topic detection Sentiment analysis Emotion detection Many, potentially correlated, variables Role of Data Analysis: Define, visualize and understand emotional similarities Focus of the present work: background, metholodogy, examples

Page 3: Celi @Clic2014: Geometric and Statistical Analysis of Topic and Emotions in Corpora

BACKGROUND

Page 4: Celi @Clic2014: Geometric and Statistical Analysis of Topic and Emotions in Corpora

A Model of Emotions in Social Networks

Francesco Tarasconi and Vittorio Di Tomaso 4

Primary emotions according to Ekman (1972): Anger

Disgust

Fear

Joy

Sadness

Surprise

Plus:

Love

Like Dislike

© Paul Ekman. All rights reserved

Page 5: Celi @Clic2014: Geometric and Statistical Analysis of Topic and Emotions in Corpora

Social TV, the “Second Screen”

Francesco Tarasconi and Vittorio Di Tomaso 5

Sharing of experiences (and emotions!) between viewers of the same program

Source: Blogmeter, www.blogmeter.it

Emotional profiles of audiences and, by extension, of whole shows / episodes

Page 6: Celi @Clic2014: Geometric and Statistical Analysis of Topic and Emotions in Corpora

METHODOLOGY

Page 7: Celi @Clic2014: Geometric and Statistical Analysis of Topic and Emotions in Corpora

Vector Space Model Representations

Francesco Tarasconi and Vittorio Di Tomaso 7

DOCi = { topic A, topic B, ... , emotion x, emotion y, ... } Annotated documents as vectors in a ntopic + nemotion dimensional space Document-annotation indicator matrix D TOPICi = [ frequency 1, frequency 2, ... , frequency nemotion ] Topics as vectors in a nemotion dimensional space Topic-emotion frequency matrix T IMPRESSIONi = { topic A, emotion x } Impressions as vectors in a ntopic + nemotion dimensional space Impression-annotation indicator matrix J

Page 8: Celi @Clic2014: Geometric and Statistical Analysis of Topic and Emotions in Corpora

Emotional Distances Between Topics

Francesco Tarasconi and Vittorio Di Tomaso 8

Key elements: 1) High variance in topic absolute frequencies

2) High variance in emotion absolute frequencies

3) A graphical representation is required

4) Why are two topics similar?

A graphical representation can be obtained using by dimension reduction.

Page 9: Celi @Clic2014: Geometric and Statistical Analysis of Topic and Emotions in Corpora

Simple and Multiple Correspondence Analysis

Francesco Tarasconi and Vittorio Di Tomaso 9

Strong link with PCA: dimension reduction, eigenvalue methods CA (Hirschfeld, 1935) of contingency table T

SVD of standardized residual matrix Principal coordinates and symmetric map Inertia and quality of the representation

MCA of indicator matrix J or Burt matrix JTJ Analysis of surveys (Benzecrì, 1960s – 1970s) As a geometric method (Le Roux and Rouanet, 2004) Adjustment of inertia (Greenacre, 2006)

Page 10: Celi @Clic2014: Geometric and Statistical Analysis of Topic and Emotions in Corpora

Why MCA

Francesco Tarasconi and Vittorio Di Tomaso 10

1) It accounts for different volumes in the original variables (masses), but focuses on the shape of data (residuals)

2) Graphical method

3) Symmetric treatment of topics and emotions

Page 11: Celi @Clic2014: Geometric and Statistical Analysis of Topic and Emotions in Corpora

EXAMPLES

Page 12: Celi @Clic2014: Geometric and Statistical Analysis of Topic and Emotions in Corpora

Social TV Emotional Landscape

Francesco Tarasconi and Vittorio Di Tomaso 12

Page 13: Celi @Clic2014: Geometric and Statistical Analysis of Topic and Emotions in Corpora

X Factor’s Emotional Phases

Francesco Tarasconi and Vittorio Di Tomaso 13

Page 14: Celi @Clic2014: Geometric and Statistical Analysis of Topic and Emotions in Corpora

MasterChef’s Quirks

Francesco Tarasconi and Vittorio Di Tomaso 14

Page 15: Celi @Clic2014: Geometric and Statistical Analysis of Topic and Emotions in Corpora

X-Factor vs MasterChef

Francesco Tarasconi and Vittorio Di Tomaso 15

Page 16: Celi @Clic2014: Geometric and Statistical Analysis of Topic and Emotions in Corpora

Conclusions and Further Researches

Francesco Tarasconi and Vittorio Di Tomaso 16

We have shown how to represent and highlight important emotional relations between topics using carefully chosen multivariate techniques. In future we would like to:

add information about the authors to our analysis; study in greater detail the clouds of impressions, documents and authors.

Page 17: Celi @Clic2014: Geometric and Statistical Analysis of Topic and Emotions in Corpora

We would like to thank: V. Cosenza and S. Monotti Graziadei for stimulating these researches; the ISI-CRT foundation and CELI S.R.L. for the support provided through the Lagrange Project;

A. Bolioli for the essential help and supervision in the preparation of this paper.

Grazie per l’attenzione!

Pisa, 9/12/2014