“ALEXANDRU IOAN CUZA ... - profs.info.uaic.rodaniela.gifu/Tehnici de ingineria... · IMPACT OF...

31
“ALEXANDRU IOAN CUZA” UNIVERSIT ATY OF IAŞI FACULTY OF COMPUTER SCIENCE Natural Language Engineering Daniela GÎFU http://profs.info.uaic.ro/~daniela.gifu/

Transcript of “ALEXANDRU IOAN CUZA ... - profs.info.uaic.rodaniela.gifu/Tehnici de ingineria... · IMPACT OF...

Page 1: “ALEXANDRU IOAN CUZA ... - profs.info.uaic.rodaniela.gifu/Tehnici de ingineria... · IMPACT OF TOPIC Sentiment Analysis (SA) - one of the most current topics in NLP. SA - offers

“ALEXANDRU IOAN CUZA” UNIVERSITATY OF IAŞI FACULTY OF COMPUTER SCIENCE

Natural Language

Engineering

Daniela GÎFU

http://profs.info.uaic.ro/~daniela.gifu/

Page 2: “ALEXANDRU IOAN CUZA ... - profs.info.uaic.rodaniela.gifu/Tehnici de ingineria... · IMPACT OF TOPIC Sentiment Analysis (SA) - one of the most current topics in NLP. SA - offers

Sentiment Analysis

Laboratory 7

2

Page 3: “ALEXANDRU IOAN CUZA ... - profs.info.uaic.rodaniela.gifu/Tehnici de ingineria... · IMPACT OF TOPIC Sentiment Analysis (SA) - one of the most current topics in NLP. SA - offers

What is Sentiment Analysis?

Page 4: “ALEXANDRU IOAN CUZA ... - profs.info.uaic.rodaniela.gifu/Tehnici de ingineria... · IMPACT OF TOPIC Sentiment Analysis (SA) - one of the most current topics in NLP. SA - offers

IMPACT OF TOPIC

Sentiment Analysis (SA) - one of the most current topics in NLP.

SA - offers possibility to monitor, to identify and understand in real

time consumer's feelings and attitudes towards brands or topics in

cyberspace and act accordingly.

SA - very popular in social media.

-Target: academia and industry.

Page 5: “ALEXANDRU IOAN CUZA ... - profs.info.uaic.rodaniela.gifu/Tehnici de ingineria... · IMPACT OF TOPIC Sentiment Analysis (SA) - one of the most current topics in NLP. SA - offers

08.05.2012

Page 6: “ALEXANDRU IOAN CUZA ... - profs.info.uaic.rodaniela.gifu/Tehnici de ingineria... · IMPACT OF TOPIC Sentiment Analysis (SA) - one of the most current topics in NLP. SA - offers

IMPACT IN SOCIAL MEDIA

Social media deals with the personal and social related opinion. SA - very vital role in understanding the opinions from such conversation, posts, blogs, etc and deriving a sensible short summary consisting of most relevant opinions. SA - helps to: • Take quick decision • To change strategy and tactics used • To understand mood of the market • Be with the changing trends • To improve one’s product

Page 7: “ALEXANDRU IOAN CUZA ... - profs.info.uaic.rodaniela.gifu/Tehnici de ingineria... · IMPACT OF TOPIC Sentiment Analysis (SA) - one of the most current topics in NLP. SA - offers

VALIDATY OF S.A.

- evaluated by comparing sentiment scores for specific comments to their respective star ratings, which are common clues used by individuals to filter what they read during information acquisition.

Page 8: “ALEXANDRU IOAN CUZA ... - profs.info.uaic.rodaniela.gifu/Tehnici de ingineria... · IMPACT OF TOPIC Sentiment Analysis (SA) - one of the most current topics in NLP. SA - offers

RESEARCH QUESTIONS...

• How comparable are sentiment scores for reviews/comments to their respective star ratings? • How do sentiment scores impact decision outcomes?

Page 9: “ALEXANDRU IOAN CUZA ... - profs.info.uaic.rodaniela.gifu/Tehnici de ingineria... · IMPACT OF TOPIC Sentiment Analysis (SA) - one of the most current topics in NLP. SA - offers

PURPOSE AND MOTIVATION

- to enhance the results of context-based SA.

- to clarify the descriptive behavior of receptor, affected by the multitude of information on forums, etc.

- to improve the performance of SA classifiers based on two approaches (machine learning & lexicon).

Page 10: “ALEXANDRU IOAN CUZA ... - profs.info.uaic.rodaniela.gifu/Tehnici de ingineria... · IMPACT OF TOPIC Sentiment Analysis (SA) - one of the most current topics in NLP. SA - offers

SA similar with…

SA – terminology:

- subjectivity [Lyons 1981; Langacker 1985]; - evidentiality [Chafe and Nichols 1986]; - analysis of stance [Biber and Finegan 1988; Conrad and Biber 2000]; - affect [Batson, Shaw, and Oleson 1992]; - point of view [Wiebe 1994; Scheibman 2002]; - evaluation [Hunston and Thompson, 2001] - appraisal [Martin and White 2005]; - opinion mining [Pang and Lee 2008]; - politeness [Gîfu and Topor, 2014].

SA - the process of detecting the contextual polarity of

text.

Page 11: “ALEXANDRU IOAN CUZA ... - profs.info.uaic.rodaniela.gifu/Tehnici de ingineria... · IMPACT OF TOPIC Sentiment Analysis (SA) - one of the most current topics in NLP. SA - offers

Sentiment classification techniques

Fig. 1 Sentiment classification techniques

Page 12: “ALEXANDRU IOAN CUZA ... - profs.info.uaic.rodaniela.gifu/Tehnici de ingineria... · IMPACT OF TOPIC Sentiment Analysis (SA) - one of the most current topics in NLP. SA - offers

SA levels – document (1)

Positive Negative Neutral

Fig. 2 Supervised learning – for three classes

a) supervised approach

Page 13: “ALEXANDRU IOAN CUZA ... - profs.info.uaic.rodaniela.gifu/Tehnici de ingineria... · IMPACT OF TOPIC Sentiment Analysis (SA) - one of the most current topics in NLP. SA - offers

SA levels – document (2)

Fig. 2 Python NLTK Demos for Natural Language Text Processing

a) supervised approach

http://text-processing.com/demo/

Page 14: “ALEXANDRU IOAN CUZA ... - profs.info.uaic.rodaniela.gifu/Tehnici de ingineria... · IMPACT OF TOPIC Sentiment Analysis (SA) - one of the most current topics in NLP. SA - offers

SA levels – document (3)

a) unsupervised approach

Based on determining the semantic orientation (SO) of specific words/phrases. 1. Sentiment lexicon (words/expressions) – [Taboada et. al, 2011] 1. Set of predefined POS models – [Turney, 2002]

Page 15: “ALEXANDRU IOAN CUZA ... - profs.info.uaic.rodaniela.gifu/Tehnici de ingineria... · IMPACT OF TOPIC Sentiment Analysis (SA) - one of the most current topics in NLP. SA - offers

SA levels – clause/sentence

More complex – identifying if a sentence is opinionated and establishing the nature of opinion; - using supervised methods; 1. classifying clauses into two classes [Yu and Hatzivassiloglou, 2003] 2. an approach based on minimal reductions. [Pang and Lee, 2004] The problem: How can we classify the interrogations, sarcasm, metaphor, humor, etc.?

Page 16: “ALEXANDRU IOAN CUZA ... - profs.info.uaic.rodaniela.gifu/Tehnici de ingineria... · IMPACT OF TOPIC Sentiment Analysis (SA) - one of the most current topics in NLP. SA - offers

SA levels – features

- more entities for each analyzed text or more attributes for each entity; - extraction of the attributes of an object; Becali a ajutat mult săracii 1/, [dar] nimeni nu a ştiut exact 2/ [cum] a făcut atâţia bani 3/. - extract and store all NPs;

- keep only NPs with frequency above a learned-by-experiments threshold [Hu and Liu, 2004]

Page 17: “ALEXANDRU IOAN CUZA ... - profs.info.uaic.rodaniela.gifu/Tehnici de ingineria... · IMPACT OF TOPIC Sentiment Analysis (SA) - one of the most current topics in NLP. SA - offers

SA levels – comparative

-When a user doesn’t offer a direct opinion about a product. [Jindal and Liu, 2006]

Dacia Logan arată mult mai bine decât Dacia Solenza. - adverbial adjectives: mai mult, mai puţin (En. - more, less) - superlative adjectives and adverbs: mai, cel puţin (En. - more, at least) - additional clauses: decât, împotriva (En. - rather than, against).

cover 98% of the comparative opinions

Page 18: “ALEXANDRU IOAN CUZA ... - profs.info.uaic.rodaniela.gifu/Tehnici de ingineria... · IMPACT OF TOPIC Sentiment Analysis (SA) - one of the most current topics in NLP. SA - offers

SA levels – sentiment lexicon (1)

a) manual approaches: WordNet [Fellbaum, 1998], European EuroWordNet [Vossen, 1998], Balkanet [Tufiş et al., 2004]

Our work: AnaDiP-2010 inspired by LIWC-2007 [Pennebaker et al., 2001]: 9 emotional classes.

<classes>

<class name="emotional" id="1"/>

<class name="positive" id="2" parent="1"/>

<class name="negative" id="3" parent="1"/>

<class name="anxiety" id="4" parent="3"/>

<class name="anger" id="5" parent="3"/>

<class name="sadness" id="6" parent="3"/>

<class name="spectacular" id="7" parent="2"/>

<class name="firmness” id="8" parent="2"/>

<class name="moderation" id="9" parent="2"/>

</classes>

Page 19: “ALEXANDRU IOAN CUZA ... - profs.info.uaic.rodaniela.gifu/Tehnici de ingineria... · IMPACT OF TOPIC Sentiment Analysis (SA) - one of the most current topics in NLP. SA - offers

SA levels – sentiment lexicon (2)

Our software performs part-of-speech (POS) tagging and lemmatization of words. For example: <lexic name="Politic" lang="ro">

<word lemma="clevetitor" classes="1,3,6"/>

<word lemma="genial" classes="1,2,7"/>

</lexic>

Page 20: “ALEXANDRU IOAN CUZA ... - profs.info.uaic.rodaniela.gifu/Tehnici de ingineria... · IMPACT OF TOPIC Sentiment Analysis (SA) - one of the most current topics in NLP. SA - offers

SA levels – sentiment lexicon (3)

a) corpus-based approaches – a set of words/phrases extracted from a relatively small corpus is extended by using a large corpus of documents on a single domain.

- a classical work [Hatzivassiloglou and McKeown, 1997]

using a set of linguistic connectors şi, sau, nici, fie (en. - and, or, not, either).

Examples: bărbat puternic şi armonios / bărbat puternic şi armonios femeie senzuală sau inteligentă? / femeie sărmană sau înstărită? băiatul nu e nici prost, nici deștept... / băiatul nu e nici prost, nici

urât...

Page 21: “ALEXANDRU IOAN CUZA ... - profs.info.uaic.rodaniela.gifu/Tehnici de ingineria... · IMPACT OF TOPIC Sentiment Analysis (SA) - one of the most current topics in NLP. SA - offers

Applications – business and government (1)

“Why aren’t consumers buying our laptop?” when the price is good, and the weight is obviously in accord with consumer’s wishes. [Lee, 2004] Two kinds of answers: - the subjective reasons about intangible qualities (e.g. the physical keyboard is tacky) or - misperceptions (even though they are wrong) Solution: By tracking consumer’s opinions, one could realize trend prediction in sales, etc. [Mishne & Glance, 2006].

Page 22: “ALEXANDRU IOAN CUZA ... - profs.info.uaic.rodaniela.gifu/Tehnici de ingineria... · IMPACT OF TOPIC Sentiment Analysis (SA) - one of the most current topics in NLP. SA - offers

Applications – business and government (2)

Solution based on a dictionary + semantic role of negations and pragmatic connectors: - classification of emotionally charged words into two classes: positive and negative (also a neutral class); - more classes, associating to each word with a value in the range -5 to +5; - [Gîfu and Cristea, 2012a] a scale to the interval -3 to +3; - [Gîfu and Scutelnicu, 2013] a scale of values: -1 to +1.

Page 23: “ALEXANDRU IOAN CUZA ... - profs.info.uaic.rodaniela.gifu/Tehnici de ingineria... · IMPACT OF TOPIC Sentiment Analysis (SA) - one of the most current topics in NLP. SA - offers

Process phases: POS-tagger & NER & Anaphora Resolution (1)

<DOCUMENT>

<P ID="1">

<S ID="1">

<W EXTRA="NotInDict" ID="11.1" LEMMA="" MSD="Vmip3s" Mood="indicative"

Number="singular" POS="VERB" Person="third" Tense="present" Type="predicative"

offset="0"></W>

<NP HEADID="11.2" ID="0" ref="0">

<W Case="direct" Gender="masculine" ID="11.2" LEMMA="nimic" MSD="Pz3msr"

Number="singular" POS="PRONOUN" Person="third" Type="negative"

offset="1">Nimic</W>

<W ID="11.3" LEMMA="mai" MSD="Rg" POS="ADVERB" offset="7">mai</W>

<W Case="direct" Definiteness="no" Gender="masculine" ID="11.4" LEMMA="odios"

MSD="Afpmsrn" Number="singular" POS="ADJECTIVE" offset="11">odios</W>

<W ID="11.5" LEMMA="," MSD="COMMA" POS="COMMA" offset="16">,</W>

<W ID="11.6" LEMMA="mai" MSD="Rg" POS="ADVERB" offset="18">mai</W>

<W ID="11.7" LEMMA="oribil" MSD="Rg" POS="ADVERB" offset="22">oribil</W>

<W Case="direct" Definiteness="no" EXTRA="NotInDict" Gender="masculine"

ID="11.8" LEMMA="decât" MSD="Afpmsrn" Number="singular" POS="ADJECTIVE"

offset="29">decât</W>

</NP>

<NP HEADID="11.9" ID="1" ref="1">

<W Case="direct" Definiteness="yes" Gender="masculine" ID="11.9" LEMMA="pantof"

MSD="Ncmpry" Number="plural" POS="NOUN" Type="common" offset="35">pantofii</W>

<NP HEADID="11.10" ID="2" ref="2">

<W Case="direct" Definiteness="no" Gender="masculine" ID="11.10" LEMMA="sport"

MSD="Ncmsrn" Number="singular" POS="NOUN" Type="common" offset="44">sport</W>

<W ID="11.11" LEMMA="cu" MSD="Sp" POS="ADPOSITION" offset="50">cu</W>

<NP HEADID="11.12" ID="3" re f="3">

<W Case="direct" Definiteness="yes" Gender="feminine" ID="11.12"

LEMMA="platformă" MSD="Ncfsry" Number="singular" POS="NOUN" Type="common"

offset="53">platformă</W>

</NP>

</NP>

</NP>

</DOCUMENT>

Page 24: “ALEXANDRU IOAN CUZA ... - profs.info.uaic.rodaniela.gifu/Tehnici de ingineria... · IMPACT OF TOPIC Sentiment Analysis (SA) - one of the most current topics in NLP. SA - offers

Process phases: POS-tagger & NER & Anaphora Resolution (2)

Fig. 3 The interface of the EAT system

Page 25: “ALEXANDRU IOAN CUZA ... - profs.info.uaic.rodaniela.gifu/Tehnici de ingineria... · IMPACT OF TOPIC Sentiment Analysis (SA) - one of the most current topics in NLP. SA - offers

SA - Rules

- 46 rules for values. <rule>

<word attribute=”LEMMA” value=”cel”/>

<word attribute=”LEMMA” value=”mai”/>

<word attribute=”POS“ value=”ADJECTIVE”/>

</rule>

Ex: cel mai bun

<rule>

<word attribute=”LEMMA” value=”cel”/>

<word attribute=”LEMMA” value=”mai”/>

<word attribute=”POS” value=”bun”/>

</rule>

Page 26: “ALEXANDRU IOAN CUZA ... - profs.info.uaic.rodaniela.gifu/Tehnici de ingineria... · IMPACT OF TOPIC Sentiment Analysis (SA) - one of the most current topics in NLP. SA - offers

Applications – review sites

- to appreciate the reviews and ratings about your company or yourself; - to summarize reviews. Our work: the consumer’s behaviour, civic identity [Gîfu et al., 2013] 6 profiles: the-decent, the-porn-aggressive, the-incitator, the-affected, the-author-attacker and supporter. - we established a number of features (lexical, syntactic, semantic): style, emotional classes, etc.

Page 27: “ALEXANDRU IOAN CUZA ... - profs.info.uaic.rodaniela.gifu/Tehnici de ingineria... · IMPACT OF TOPIC Sentiment Analysis (SA) - one of the most current topics in NLP. SA - offers

Applications – politics/sociology

Two dimensions in politics: 1. to know what electors are thinking about the political candidates [Efron, 2004, Goldberg et al., 2007, Layer et al., 2003, Mullen and Malouf, 2008]; 2. to clarify the politicians’ positions to enhance the quality of information that voters have access to [Bansal et al., 2008, Gîfu, 2013b]

In sociology: - how ideas and innovations are propagated [Rosen, 1974] Ex: the polls on different issues

Page 28: “ALEXANDRU IOAN CUZA ... - profs.info.uaic.rodaniela.gifu/Tehnici de ingineria... · IMPACT OF TOPIC Sentiment Analysis (SA) - one of the most current topics in NLP. SA - offers

CONCLUSIONS AND DISCUSSIONS

SA - a complex task; SA - an emerging discipline with promising academic and, most important!!!, industrial applications; .... the sentiment classification problem - more challenging

Page 29: “ALEXANDRU IOAN CUZA ... - profs.info.uaic.rodaniela.gifu/Tehnici de ingineria... · IMPACT OF TOPIC Sentiment Analysis (SA) - one of the most current topics in NLP. SA - offers

29

Final project: SEMEVAL 2018/2019

Lab. 7 SA: - NLTK (Naïve Bayes Classifier): https://www.nltk.org/_modules/nltk/classify/positivenaivebayes.html - TextBlob – perform different NLP tasks: POS Tagging, NPs Extraction, SA, etc.): https://textblob.readthedocs.io/en/dev/index.html

Methodology: 1 Categorize each text/document (e.g. tweets) into a specific class positive, negative, neutral; 2. Add new instances <pos>…</pos>, <neg>…</neg>, etc.

according to the classifier used… in your XML/ JSON file for each common noun. 3. Add new statistics over the SEMEVAL corpus for SA task

Page 30: “ALEXANDRU IOAN CUZA ... - profs.info.uaic.rodaniela.gifu/Tehnici de ingineria... · IMPACT OF TOPIC Sentiment Analysis (SA) - one of the most current topics in NLP. SA - offers

Statistics over the SEMEVAL data set

Counted elements Value

# sentences

# tokens (punctuation included)

# tokens (excluded punctuation)

# entities

# person

# location

….

# sentiments

# positive (single class or using other subclasses)

# negative (single class or using other subclasses)

# neutral

Page 31: “ALEXANDRU IOAN CUZA ... - profs.info.uaic.rodaniela.gifu/Tehnici de ingineria... · IMPACT OF TOPIC Sentiment Analysis (SA) - one of the most current topics in NLP. SA - offers

Thank you for your attention!

?