Developing Smart Cities Services through Semantic Analysis of Social Streams

105
Developing Smart Cities Services through Semantic Analysis of Social Streams Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops (Università degli Studi di Bari ‘Aldo Moro’, Italy - SWAP Research Group) WDS4SC 2015 WWW 2015 Workshop on Web Data Science and Smart Cities Florence (Italy) - May 19, 2015

Transcript of Developing Smart Cities Services through Semantic Analysis of Social Streams

Developing Smart Cities Services through Semantic Analysis

of Social StreamsCataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops

(Università degli Studi di Bari ‘Aldo Moro’, Italy - SWAP Research Group)

WDS4SC 2015 WWW 2015 Workshop on

Web Data Science and Smart Cities Florence (Italy) - May 19, 2015

Outline• Background

• Information Overload • Social Content Analytics

• CrowdPulse • Social Data Extraction • Semantic Tagging • Sentiment Analysis • Processing & Visualization

• Use Cases • L’Aquila Social Urban Network • The Italian Hate Map

• Conclusions

2Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Background

3Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Background

4

Information Overload

Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Information Overload

5

… in digital life

Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Information Overload

6… in real life

Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

7

Obstacleor

Opportunity?

Information Overload

Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Background (again)

8Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

9

Social Networks

can be considered as novel data silosCataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

10

Social Networks

information about preferencesCataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

11

Social Networks

information about connectionsCataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

12

Social Networks

information about people feelingsCataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

13

Social Networks

changed the rule for content analytics

Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

14

Social Content AnalyticsSuccessful Use Cases

- Online brand monitoring

- Social CRM- Real-time polls

All these applications share a common insight

Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

15

Social Content AnalyticsResearch Question

Is it possible to aggregate rough human-generated data to get complex people-based findings?

Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

16

Our contribution: CrowdPulse

A framework for real-time Semantic Analysis of Social Streams

Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

17

CrowdPulse

Social Data Extraction

Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

features

Semantic Tagging

Sentiment Analysis Processing & Visualization

18Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

workflowCrowdPulse

19Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Step 1: Social Data ExtractionCrowdPulse

20Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Step 1: Social Data Extraction

Extraction

Source

Heuristics

CrowdPulse

21Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Step 1: Social Data Extraction

Extraction

Source

Heuristics

CrowdPulse

22Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Step 1: Social Data Extraction

Extraction

Source

Heuristics

ContentUserGeo

Content+Geo

#www2015#democrats

#traffic

@barack_obama@comunefi

#earthquake

Page

Group

CrowdPulse

23Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Step 1: Social Data Extraction

Extraction

Source

Heuristics

ContentUserGeo

Content+Geo

#www2015#democrats

#traffic

@barack_obama@comunefi

#earthquake

Page

GroupWe only extract public content

CrowdPulse

24Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Step 2: Semantic TaggingCrowdPulse

25

aquila

??

(eagle)

(italian city)

(italian)

Semantic TaggingMotivations

Poor Semantics

Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Keyword-based representation introduces a lot of noise in the analysis

26

aquila

??

(eagle)

(italian city)

(italian)

Semantic TaggingMotivations

Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

(Please, do something: l’Aquila is going to die!)(Please, do something: the eagle is going to die!)

“Fate qualcosa per favore, l’Aquila sta morendo!”

?

27

Semantic TaggingMotivations

Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

• Entity Linking Algorithms• Input: textual content • Output: identification and

disambiguation of the entities mentioned in the text.

(1) http://tagme.di.unipi.it

(2) http://spotlight.dbpedia.org

28

Step 2: Semantic Tagging

Solution: semantic processing of extracted content

Algorithms

CrowdPulse

Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

29

Step 2: Semantic TaggingCrowdPulse

Entity Linking: identification and disambiguation of the entities mentioned in the text.

Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

30

Step 2: Semantic TaggingCrowdPulse

Non-trivial NLP tasks (stopwords removal, n-grams identification, named entities recognition and disambiguation) are automatically performed

Entity Linking: identification and disambiguation of the entities mentioned in the text.

Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

CrowdPulse

31

Step 2: Semantic Tagging

Entity Linking: identification and disambiguation of the entities mentioned in the text.

Each entity is a reference to a Wikipedia page http://it.wikipedia.org/wiki/Massimo_Cialente

IMPORTANT!

Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

We enriched the entity-based representation by exploiting the Wikipedia categories’ tree

32Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

CrowdPulseStep 2: Semantic Tagging

We enriched the entity-based representation by exploiting the Wikipedia categories’ tree

33Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

CrowdPulseStep 2: Semantic Tagging

Many interesting (new) features come into play!(e.g. italian politics, L’Aquila mayors, Democrats politics)

The final representation of each content is obtained by merging the entities identified in the text with the most relevant Wikipedia categories each entity is linked to.

Features = Entities + Wikipedia Categories

34Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

CrowdPulseStep 2: Semantic Tagging

35

CrowdPulse

Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Step 3: Sentiment Analysis

36

Sentiment AnalysisMotivations

Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Is this content conveying any opinion?

37

Sentiment AnalysisMotivations

Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Is this content conveying any opinion?

This is a crucial issue if people-based findings have to be generated

38

Sentiment AnalysisDefinition

“It is the field of study that analyzes people’s

opinions, sentiments, evaluations, appraisals, attitudes, and emotions towards entities such as

products, services, organizations, individuals, issues, events, topics, and

their attributes “ (*)

(Pang, Bo, and Lillian Lee. "Opinion mining and sentiment analysis." Foundations and trends in information retrieval, 2008)

Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

We concentrated on the polarity detection task

39

Sentiment Analysis

Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

How to develop an (unsupervised) sentiment analysis algorithm?

40

External lexical resourcesassociate a polarity score to each term.

joy +++

frustration - -

Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Sentiment AnalysisLexicon

41

SenticNet(*)

(*) Cambria, Erik, Daniel Olsher, and Dheeraj Rajagopal. "SenticNet 3: a common and common-sense knowledge base for cognition-driven sentiment analysis." Twenty-eighth AAAI conference on artificial intelligence. 2014.

Inspired by the Hourglass of Emotions model

Each term is represented on the ground of the intensity of four basic emotional dimensions (sensitivity, aptitude, attention, pleasantness)

The activation level of each dimension defines 16 basic emotions

Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Sentiment Analysis

42

Sentiment AnalysisSenticNet

According to the triggered emotions, each term is provided with an aggregated polarity score

Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

43

SenticNet

SenticNet models a sentiment score for some bigrams and trigrams as well!

Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Sentiment Analysis

44

Insight:The polarity of a textual content (e.g. a

microblog posts) depends on the polarity of the microphrases which compose it.

Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Sentiment AnalysisMethodology

45

Insight:The polarity of a textual content (e.g. a

microblog posts) depends on the polarity of the microphrases which compose it.

A microphrase is built whenever a splitting cue

is found in the text

Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Sentiment AnalysisMethodology

46

Insight:The polarity of a textual content (e.g. a

microblog posts) depends on the polarity of the microphrases which compose it.

A microphrase is built whenever a splitting cue

is found in the text

Conjunctions, adverbs and punctuations are used as

splitting cues

Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Sentiment AnalysisMethodology

47

Insight:The polarity of a textual content (e.g. a

microblog posts) depends on the polarity of the microphrases which compose it.

A microphrase is built whenever a splitting cue

is found in the text

Conjunctions, adverbs and punctuations are used as

splitting cues

example: “I don’t like this food, it’s terrible”

Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Sentiment AnalysisMethodology

48

Insight:The polarity of a textual content (e.g. a

microblog posts) depends on the polarity of the microphrases which compose it.

A microphrase is built whenever a splitting cue

is found in the text

Conjunctions, adverbs and punctuations are used as

splitting cues

example: “I don’t like this food, it’s terrible”{ { m1 m2

splittingcue

Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Sentiment AnalysisMethodology

49

Insight:

pol(C) = ∑ pol(mi)

The polarity of a textual content (e.g. a microblog posts) depends on the polarity of the microphrases which compose it.

i=1

k

Content microphrase

T={m1…mk}

Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Sentiment AnalysisMethodology

50

Insight:

pol(T) = ∑ pol(mi)i=1

k

The polarity of a content depends on the polarity of the micro-phrases which

compose it.

pol(mi) = ∑ score(tj)j=1

term

n

T={m1…mk}

Mi={t1…tn}

Content microphrase

Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Sentiment AnalysisMethodology

51

Insight:

pol(T) = ∑ pol(mi)i=1

k

The polarity of a microphrase depends on the polarity of the terms which compose it.

pol(mi) = ∑ score(tj)j=1

term

n

T={m1…mk}

Mi={t1…tn}

Tweet microphrase

Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Sentiment AnalysisMethodology

52

CrowdPulse

Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Step 3: Sentiment Analysis

53

CrowdPulse

Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Step 3: Sentiment Analysis

Overall sentiment: :-(

54

CrowdPulse

Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Step 3: Sentiment Analysis

Overall sentiment: :-(The process can be iterated over a larger set of content, to get findings about the feeling of the

population regards a certain topic

55

CrowdPulse

Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Step 3: Sentiment Analysis

Overall sentiment: :-(

56

CrowdPulse

Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Step 4: Processing & Visualization

57

CrowdPulse

Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Step 4: Domain-specific processing

Supervised learning

Unsupervised learning

Linguistic Analysis

classification, regression tasks

clustering

building word spaces, similarity between concepts, analysis of terms usage, etc.

58

CrowdPulse

Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Step 4: Domain-specific processing

Supervised learning

Unsupervised learning

Linguistic Analysis

classification, regression tasks

clustering

building word spaces, similarity between concepts, analysis of terms usage, etc.

CrowdPulse natively supports all these methodologies

59

CrowdPulse

Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Step 4: Domain-specific processing

Supervised learning

Unsupervised learning

Linguistic Analysis

classification, regression tasks

clustering

building word spaces, similarity between concepts, analysis of terms usage, etc.

The choice is typically scenario-dependent

60

CrowdPulse

Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Step 4: Data Visualization

An interactive analytics console is made available for

each project

Descriptive statistics can be built in real-time and can be

immediately shown

61

CrowdPulse

Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Recap

62

Use Cases

Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

63Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

L’Aquila Social Urban Network The Italian Hate Map1. 2.

Use Cases

64Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Use CasesL’Aquila Social Urban Network

April 6, 20095.8 magnitude earthquake20 billions damages70,000 people displaced309 people died

65Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Use CasesL’Aquila Social Urban Network

2015: six years later7 billions fundings still needed22,000 people still displacedDiaspora

66Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Use CasesL’Aquila Social Urban Network

19 ‘new towns’ around l’Aquila 15,200 people today live there

67Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Use CasesL’Aquila Social Urban Network

What about the consequences?

Loss of trust, sense of belonging, relationships

68Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Use CasesL’Aquila Social Urban Network

Loss of social capital

69Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Use CasesL’Aquila Social Urban Network

Research Question:Is it possible to extract and process social

media to monitor in real time people feelings, opinions and sentiments about the current

state of the social capital of L’Aquila?

70Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Use CasesL’Aquila Social Urban Network

Research Question:

We can use CrowdPulse

Is it possible to extract and process social media to monitor in real time people feelings, opinions and sentiments about the current

state of the social capital of L’Aquila?

71Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Use CasesL’Aquila Social Urban Network

Heuristics: - Twitter users (local newspapers, mention to politicians) - Twitter content+geo (50km around l’Aquila with specific hashtags as #laquila #earthquake, etc)

CROWDPULSE SETTINGS

72Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Use CasesL’Aquila Social Urban Network

CROWDPULSE SETTINGS

Heuristics: - Facebook groups (identified after a thorough analysis) - Facebook pages (identified after a thorough analysis)

73Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Use CasesL’Aquila Social Urban Network

Extracted content (example)

Tweets about the fear of new earthquakes.

Facebook posts about citizens’ proposals.

Tweets about people worried of the situation.Tweets about new buildings in the city.

CROWDPULSE SETTINGS

74Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Use CasesL’Aquila Social Urban Network

Sentiment Analysis and Semantic Tagging of the content

CROWDPULSE SETTINGS

75Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Use CasesL’Aquila Social Urban Network

How to map each content with the social indicator it refers to?

CROWDPULSE SETTINGS

76Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Use CasesL’Aquila Social Urban Network

Given a fixed set of social capital indicators, we built a classification model to associate each content (along with

its sentiment) to the social indicator it refers to.

CROWDPULSE SETTINGS

77Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Use CasesL’Aquila Social Urban Network

Tweet about new buildings in the city.

Input: Social indicators + semantic representation of the content

Tweet about new buildings in the city.

78Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Use CasesL’Aquila Social Urban Network

Domain-specific processing: Classification model

Tweet about new buildings in the city.

79Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Use CasesL’Aquila Social Urban Network

Output: (multi-class) classification + sentiment

Tweet about new buildings in the city.

80Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Use CasesL’Aquila Social Urban Network

Tweet about new buildings in the city.

The score of a social indicator is the average sentiment of all the content referring to it.

81Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Use CasesL’Aquila Social Urban Network

CROWDPULSE OUTPUT

Overall score of the social indicators between March and August 2014

82Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Use CasesL’Aquila Social Urban Network

CROWDPULSE OUTPUT

COMMUNITY PROMOTER

DEFINES SOME INITIATIVES TO EMPOWER THE SOCIAL CAPITAL

MONITORS THE STATE OF THE SOCIAL INDICATORS

Real-world applicationof the output

Conclusions

83Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

L’Aquila Social Urban Network

Crowdsourcing-based approach

Social content about L’Aquila are extracted and processed in real-time

Machine Learning exploited to build a classification

model

Sentiment Analysis used to provide each social

indicator with a score

1. 2.

3. 4. Analytics Console used to monitor the state of the social

capital in real-time

Almost 500,000 social content extracted and analyzed.

84Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Use CasesThe Italian Hate Map

http://users.humboldt.edu/mstephens/hate/hate_map.html

Inspired by the Hate Map built by

the Humboldt University

joint research with a psychologists team of Rome University and a

no-profit agency focused on human

rights

85Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Use CasesThe Italian Hate Map

http://users.humboldt.edu/mstephens/hate/hate_map.html

Insight:To aggregate rough people-based data in order to analyze

complex phenomena.

86Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Use Cases

Research Question:Is it possible to extract and process social media

to detect intolerant content posted on social networks and identify the most at-risk areas of the

Italian country?

The Italian Hate Map

87Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Use Cases

Research Question:Is it possible to extract and process social media

to detect intolerant content posted on social networks and identify the most at-risk areas of the

Italian country?

We can use CrowdPulse

The Italian Hate Map

88Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Use Cases

Heuristics: Twitter content- 76 intolerant seed terms, defined by the psychologists teams - 5 intolerance dimensions: violence (against women), racism,

homophobia, disability, anti-semitism

CROWDPULSE SETTINGS

The Italian Hate Map

89Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Use Cases

Extracted content (seed term: nano/midget)

Tweet about an Italian ministry

CROWDPULSE SETTINGS

Tweet about iPod nano

Tweet about an Italian football player

The Italian Hate Map

90Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Use Cases

Many non-intolerant Tweets are extracted!

Tweet about an Italian ministry

CROWDPULSE SETTINGS

Tweet about iPod nano

Tweet about an Italian football playerX

X

The Italian Hate Map

91Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Use Cases

Non-intolerant Tweets are detected and filtered out.

CROWDPULSE SETTINGS

The Italian Hate Map

92Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Use Cases

Ironic Tweets are detected and filtered out.

CROWDPULSE SETTINGS

The Italian Hate Map

93Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Use CasesCROWDPULSE SETTINGS

The Italian Hate Map

94Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Use Cases

We have to build a map, so we only need geotagged content

CROWDPULSE SETTINGS

The Italian Hate Map

95Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Use Cases

We have to build a map, so we only need geotagged content

CROWDPULSE SETTINGS

The Italian Hate Map

96Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Use CasesCROWDPULSE SETTINGS

The Italian Hate Map

Definition of heuristics to increase the number of geotagged Tweets

97Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Use CasesThe Italian Hate Map

Dimension #Tweets #Geo %Geo

Homophobia 110,774 8,501 7,66%

Racism 154,170 1,940 1,24%

Violence 1,102,494 28,886 2,62%

Disability 479,654 3,410 0,75%

Anti-Semitism 6,000 1,150 18,03%

98Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Use CasesCROWDPULSE OUTPUT

The Italian Hate Map

Violence against women Disability

Racism Homophobia

99Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Use CasesCROWDPULSE OUTPUT

The Italian Hate Map

Given the maps and given the output of the linguistic analysis of intolerant Tweets (co-occurrences between terms,

timelapse, etc.), the psychologists team defined some guidelines to tackle and prevent intolerant behaviors.

These guidelines have been freely distributed to public administration on early 2015.

Conclusions

100Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Crowdsourcing-based approach

Social content containing the seed terms is extracted and processed in

real-time

Semantic Processing exploited to delete non-intolerant

Tweets

Sentiment Analysis

used to filter out Tweet with irony

1. 2.

3. 4. Analytics Console used to build real-time hate

maps

Almost 2,000,000 social content extracted and analyzed.

The Italian Hate Map

Lessons Learned

101Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Lessons Learned

102

Pipeline of state of the art techniquesEntity Linking, Sentiment Analysis, Machine Learning, Data Visualization

Use Cases.L’Aquila Social Urban Network The Italian Hate Map

DEFINITION OF A FRAMEWORK FOR REAL-TIME SEMANTIC CONTENT ANALYSIS

1.2.

Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Lessons Learned

103

Pipeline of state of the art techniquesEntity Linking, Sentiment Analysis, Machine Learning, Data Visualization

Use Cases.L’Aquila Social Urban Network The Italian Hate Map

DEFINITION OF A FRAMEWORK FOR REAL-TIME SEMANTIC CONTENT ANALYSIS

1.2.

Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

The outcomes of both use cases showed that very complex phenomena can be analyzed in a totally new

way, thanks to the huge availability of textual data

Future Research

104

Integration of further machine learning techniques, and further data visualization formalisms

Evaluation of the real impact of the framework in real-world dynamics (e.g., do intolerant behaviors decrease thanks to the Hate Map?)

Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, Pasquale Lops Developing Smart Cities Services through Semantic Analysis of Social Streams. WDS4SC 2015 Workshop, Florence (Italy) 19.05.2015

Improvement of the algorithms for semantic tagging, text classification and sentiment analysis

questions?Cataldo Musto, Ph.D [email protected]

@cataldomusto