Tweet alert - semantic analysis in social networks for citizen opinion mining

26
TweetAlert: Semantic Analytics in Social Networks for Citizen Opinion Mining in the City of the Future Julio Villena-Román 1,2 , Adrián Luna-Cobos 1,3 , José Carlos González-Cristóbal 3,1 1 DAEDALUS - Data, Decisions and Language, S.A. 2 Universidad Carlos III de Madrid 3 Universidad Politécnica de Madrid [email protected], [email protected], [email protected] PeGOV 2014 – 2 nd Workshop on Personalization in eGovernment Services and Applications 11 July 2014, Aalborg, Denmark

description

Description of a configurable, real-time system for automatic record, analysis and visualization of information from user interactions in Twitter. The system is designed to provide public bodies (government agencies) with a powerful tool to rapidly and easily understand what the citizen behavior trends are, what their opinion about city services, events, etc. is, and also may be used as a primary alert system to improve the efficiency of emergency systems. The citizen is here observed as a proactive city sensor capable of generating huge amounts of very rich, high-level and valuable data through social media platforms, which, after properly processed, summarized and annotated, allows city officers to better understand citizen needs. The architecture and component blocks are described and some key details of the design, implementation and scenarios of application are discussed. Textalytics APIS are used for the semantic analysis of relevant tweets. Presentation by DAEDALUS, UPM and UC3M at PEGOV 2014, 2nd International Workshop on Personalization in eGovernment Services and Applications, Aalborg, Denmark, in conjunction with the 22nd Conference on User Modeling, Adaptation and Personalization - UMAP 2014.

Transcript of Tweet alert - semantic analysis in social networks for citizen opinion mining

Page 1: Tweet alert - semantic analysis in social networks for citizen opinion mining

TweetAlert: Semantic Analytics in Social Networks

for Citizen Opinion Mining in the City of the Future

Julio Villena-Román1,2, Adrián Luna-Cobos1,3, José Carlos González-Cristóbal3,1

1 DAEDALUS - Data, Decisions and Language, S.A. 2 Universidad Carlos III de Madrid

3 Universidad Politécnica de Madrid [email protected], [email protected], [email protected]

PeGOV 2014 – 2nd Workshop on Personalization in eGovernment Services and Applications 11 July 2014, Aalborg, Denmark

Page 2: Tweet alert - semantic analysis in social networks for citizen opinion mining

PeGOV-2014 11 July 2014, Aalborg, Denmark 2

Agenda !  Framework !  Citizen Sensor !  System !  Business cases !  Future work

Page 3: Tweet alert - semantic analysis in social networks for citizen opinion mining

PeGOV-2014 11 July 2014, Aalborg, Denmark 3

Framework !  Ciudad 2020 aims to achieve significant improvements in areas of

energetic efficiency, Internet of the Future, Internet of Things, human behaviour, environmental sustainability and mobility and transport, in order to design the City of the Future: sustainable, efficient, smart. !  Spanish R&D project, INNPRONTA Programme, Center for Industrial

Technological Development (CDTI), Ministry of Economy and Competitiveness

!  2011-2014 !  16,3 M€ budget !  5 multinational corporations, 4 SMEs, 8 PRIs

!  Daedalus focuses on the automatic extraction of meaning from all types of multimedia content, using NLP technologies and data/text analytics to help our customers solve any challenge in these areas.

Page 4: Tweet alert - semantic analysis in social networks for citizen opinion mining

PeGOV-2014 11 July 2014, Aalborg, Denmark 4

Citizen Sensor

mobility

opinions in social media

relationship with public administration

collaborative sensing

professional activities

relationship with other people

Citizen 2020 = another city sensor

surveys

leisure and free time

Page 5: Tweet alert - semantic analysis in social networks for citizen opinion mining

PeGOV-2014 11 July 2014, Aalborg, Denmark 5

Citizen Sensor !  Innovative way to capture a very descriptive high-level

heterogeneous information, bringing high added value especially when considering aggregations ! More complex and richer information than other sensors

!  “smells awful”, “there is a fire”, “I’m going to the sales”… !  Individual actions may show citizen trends

!  validate a bus ticket " route density ! Opinion/sentiments of the citizen about the city

!  follow social networks to assess the impact of new policies ! Collaborative sensing

!  using smartphones to get data (pollution, energy consumption) with low cost and new possibilities

Page 6: Tweet alert - semantic analysis in social networks for citizen opinion mining

PeGOV-2014 11 July 2014, Aalborg, Denmark 6

Our approach What: Build a system able to capture, store and analyze user

messages Where: In Twitter For whom: City administrators What for: To help them rapidly and easily understand citizen

behaviour trends and know their opinion about city services, events, etc.

Why: To enable them to better understand citizen necessities, generate hypotheses over urban behaviour models, in order to improve municipal management policies, bringing them closer to the actual reality of the citizens

How: Using NLP technologies When: In real-time

Page 7: Tweet alert - semantic analysis in social networks for citizen opinion mining

PeGOV-2014 11 July 2014, Aalborg, Denmark 7

Architecture

Page 8: Tweet alert - semantic analysis in social networks for citizen opinion mining

PeGOV-2014 11 July 2014, Aalborg, Denmark 8

Information Repository !  Stores the high volume of data and provides advanced search

functionality to exploit the information !  Based on Elasticsearch

! open source, distributed, real-time search and analytics engine !  complex search capabilities !  scalable high-performance solution

http://www.elasticsearch.org

Page 9: Tweet alert - semantic analysis in social networks for citizen opinion mining

PeGOV-2014 11 July 2014, Aalborg, Denmark 9

Gatherer !  Set of concurrent processes that query the Twitter APIs to collect

tweets ! Search or Streaming API ! Filter by a list of user identifiers, a list of keywords to track (terms,

hashtags) and/or a set of geographical bounding boxes ! Returns tweet text, author, location, embedded media

https://dev.twitter.com/docs/api/1.1

Page 10: Tweet alert - semantic analysis in social networks for citizen opinion mining

PeGOV-2014 11 July 2014, Aalborg, Denmark 10

Inquirer !  Set of concurrent processes that annotate tweets using our

Textalytics Core APIs !  Entities !  Concepts !  Hashtags !  Thematic area of the message (transport, economy, daily life…) !  Citizen Sensor model

!  Alert situations (road accidents, fires, street violence…) !  Specific location of the user (building, means of transport...) !  Events to which the text refers (cultural events, sports...)

!  Sentiment polarity : P+, P, NEU, N, N+, NONE !  Irony and subjectivity !  User demographics: gender, age, type of tweet author

Topic Extraction API

Sentiment Analysis API

Text Classification

API

User Demographics API

http://textalytics.com

Page 11: Tweet alert - semantic analysis in social networks for citizen opinion mining

PeGOV-2014 11 July 2014, Aalborg, Denmark 11

Entities, concepts, hashtags Advanced NLP to obtain POS, syntactic tree and semantic analyses of the text and use it to identify different types of significant elements

Page 12: Tweet alert - semantic analysis in social networks for citizen opinion mining

PeGOV-2014 11 July 2014, Aalborg, Denmark 12

Text classification State-of-the-art hybrid text classification model using a statistical classification combined with a rule-based filtering

Social Media Citizen Sensor

Page 13: Tweet alert - semantic analysis in social networks for citizen opinion mining

PeGOV-2014 11 July 2014, Aalborg, Denmark 13

Topics

Page 14: Tweet alert - semantic analysis in social networks for citizen opinion mining

PeGOV-2014 11 July 2014, Aalborg, Denmark 14

Alerts

Page 15: Tweet alert - semantic analysis in social networks for citizen opinion mining

PeGOV-2014 11 July 2014, Aalborg, Denmark 15

Locations, events

Page 16: Tweet alert - semantic analysis in social networks for citizen opinion mining

PeGOV-2014 11 July 2014, Aalborg, Denmark 16

Sentiment analysis State-of-the-art lexicon-based model for sentiment analysis, using POS and syntactic tree for detecting negation and controlling the scope of modifiers + subjectivity classification + irony detection

Page 17: Tweet alert - semantic analysis in social networks for citizen opinion mining

PeGOV-2014 11 July 2014, Aalborg, Denmark 17

User Demographics Text classification based on n-grams model to guess user type, gender and age from his/her login, name and profile description

Page 18: Tweet alert - semantic analysis in social networks for citizen opinion mining

PeGOV-2014 11 July 2014, Aalborg, Denmark 18

Example

{ "text":"el viento ha roto una rama y hay un atascazo increible en toda la gran vía...",

"tag_list":[ {"type":"sensor", "value":"011002 Ubicación - Exteriores - Vías públicas"}, {"type":"sensor", "value":"070700 Alertas meteorológicas - Viento"},

{"type":"sensor", "value":"080100 Incidencia - Congestión de tráfico"}, {"type":"topic", "value":"06 medio ambiente, meteorología y energía"}, {"type":"entity", "value":"Gran Vía"}, {"type":"concept", "value":"viento"},

{"type":"sentiment", "value":"N"}, {"type":"subjectivity", "value":"OBJ"}, {"type":"irony", "value":"NONIRONIC"}, {"type":"user_type", "value":"PERSON"},

{"type":"user_gender", "value":"FEMALE"}, {"type":"user_age", "value":"25-35"} ]

}

Page 19: Tweet alert - semantic analysis in social networks for citizen opinion mining

PeGOV-2014 11 July 2014, Aalborg, Denmark 19

Geolocation

Page 20: Tweet alert - semantic analysis in social networks for citizen opinion mining

PeGOV-2014 11 July 2014, Aalborg, Denmark 20

Visualization

http://www.highcharts.com http://openlayers.org http://d3js.org

Page 21: Tweet alert - semantic analysis in social networks for citizen opinion mining

PeGOV-2014 11 July 2014, Aalborg, Denmark 21

Ongoing business cases !  City console for a local administration to analyze in real-time the

behaviour and topics of interest of the citizens, with two components: !  a private console, internal for the city services, for analytics !  a public dashboard to engage citizens with their city, displaying

attractive, summarized, non-confidential information at selected public locations (town hall, libraries, museums) or a LED video wall in a populous square in downtown

!  Social alert detection system ! For 112 emergency services, providing early detection of security-

related issues

Page 22: Tweet alert - semantic analysis in social networks for citizen opinion mining

PeGOV-2014 11 July 2014, Aalborg, Denmark 22

For short/mid term future !  Trending topics geolocation clustering

! Analysis at neighbourhood level

health

traffic jam

air pollution

jellyfish

pollen

Page 23: Tweet alert - semantic analysis in social networks for citizen opinion mining

PeGOV-2014 11 July 2014, Aalborg, Denmark 23

For short/mid term future !  Analysis of city pace of life

Page 24: Tweet alert - semantic analysis in social networks for citizen opinion mining

PeGOV-2014 11 July 2014, Aalborg, Denmark 24

For short/mid term future !  Mobility analysis

! How, when, why people move through the city ! Route identification (home"work"free time"home) ! Route changes (due to weather)

Page 25: Tweet alert - semantic analysis in social networks for citizen opinion mining

PeGOV-2014 11 July 2014, Aalborg, Denmark 25

For short/mid term future !  City reputation and brand personality !  Automated satisfaction surveys

Page 26: Tweet alert - semantic analysis in social networks for citizen opinion mining

PeGOV-2014 11 July 2014, Aalborg, Denmark 26

This work has been supported by several Spanish R&D projects: Ciudad2020: Hacia un nuevo modelo de ciudad inteligente sostenible (INNPRONTA IPT-20111006), MA2VICMR: Improving the access, analysis and visibility of the multilingual and multimedia information in web for the Region of Madrid (S2009/TIC-1542) and MULTIMEDICA: Multilingual Information Extraction in Health domain and application to scientific and informative documents (TIN2010-20644-C03-01). Authors would like to thank all partners for their knowledge and support.