UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity combining micro-blogging,...
-
Upload
nicole-novielli -
Category
Technology
-
view
355 -
download
5
Transcript of UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity combining micro-blogging,...
UNIBA at EVALITA 2014-SENTIPOLC TaskPredicting tweet sentiment polarity combiningmicro-blogging lexicon and semantic features
Pierpaolo Basile and Nicole Novielli
pierpaolobasileunibait nicolenovielliunibaitDepartment of Computer Science
University of Bari Aldo Moro (ITALY)
EVALITA 2014Pisa 11th December 2014
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 1 21
Introduction The Task
Task Overview
EVALITA 2014 SENTIment POLarity Classification (SENTIPOLC)
Subtask A - Subjectivity Classification classify subjectivityobjectivity of the tweetrsquoscontent
La Yamaha e buona solo ad alzar polemiche E la Honda a crearne rarr subj
Domani sera a Palazzo Chigi incontro informale tra Mario Monti e parti sociali I leaderdi Cgil Cisl Uil e Ug http bit ly rNWoB0 rarr obj
Subtask B - Polarity Classification classify tweetrsquos polarity positive negative neutraland tweets expressing both positive and negative sentiment
Maroni La politica del governo Monti e sbagliata rarr neg
Coma iniziare un buon sabato Guardandosi South Park il film rarr pos
Pilot Task Irony Detection
governo monti piu equita o piu equitalia
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 2 21
The System Our Approach
Our Approach
Supervised approach relying on three kinds of features
bull Keywords and micro-blogging properties of tweets
bull Content representation in a distributional semantic model
bull Sentiment lexicon
Our contribution
bull Represent both the tweets and the polarity classes in the word space
bull Automatically develop a sentiment lexicon for the Italian starting formSentiWordNet
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 3 21
The System Features
Building a Sentiment Lexicon for Italian
Automatic development of a sentiment lexicon starting form SentiWordNet
Lexicon-based features
bull features based on sentiment scores of tokens
bull sentiment variation in the tweet
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 4 21
The System Features
Building the Word Space from Unlabeled Tweets
Homogenous representation of both tweets and polarity classes in the word space
Distributional features
bull tweet vectorminusrarrt as semantic feature in training our classifiers
bull similarity betweenminusrarrt and each prototype vector
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 5 21
The System Features
Building the Word Space from Unlabeled Tweets
Homogenous representation of both the tweets and the polarity classes in the word space
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 6 21
The System Features
Tweets downloading
Idea
Using four lexicons extracted from training data for each class (pos neg subj obj)
Probability assigned to each token
P(t|ci ) =t + 1
toti + |V | (1)
Rank tokens in descending order according to the Kullback-Leibler divergence (KLD)
KLD = P(t|cs) lowast logP(t|cs)P(t|co)
(2)
Top terms (50) in the rank for each class are used as seeds for downloading the samenumber of tweets for each lexicon
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 7 21
The System Features
Keywords and Micro-blogging features
Keywords
bull exploit tokens occurring in the tweets (unigrams)
bull replace the user mentions URLs and hashtags with three metatokens ldquo USER rdquoldquo URL rdquo and ldquo TAG rdquo
Micro-blogging
bull the use of upper case and character repetitions
bull positive and negative emoticons
bull informal expressions of laughters (ie sequences of ldquoahrdquo)
bull the presence of exclamation and interrogative marks
bull occurrences of adversative words disjunctive words conclusive words andexplicative words
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 8 21
Learning
Learning step
bull Constrained run only the provided training data can be used but lexicons areallowed extract the features from the training data and run the learning algorithm
bull Unconstrained run additional training data can be included investigate aco-training approach to automatically add new examples to the training set
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 9 21
Learning
Learning step
bull Constrained run only the provided training data can be used but lexicons areallowed extract the features from the training data and run the learning algorithm
bull Unconstrained run additional training data can be included investigate aco-training approach to automatically add new examples to the training set
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 9 21
System Setup
System Setup overview
bull Learning algorithm
bull SVM with the RBF kernelbull C=4 selected after a 10-folds validation on training databull total number of features 12117
bull Completely developed in JAVA
bull Weka library is adopted for learning
bull Tweets are tokenized using ldquoTwitter NLP and Part-of-Speech Taggingrdquo
bull Word space download 10 million tweets using the Twitter Streaming API and
build the space using ldquoword2vecrdquo
bull continuous Bag-of-Words Model (CBOW)bull 200 vector dimensionsbull remove the terms with less than ten occurrences (about 200000 terms
overall)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 10 21
Evaluation
Evaluation
Evaluating systems on their ability in
bull Task A decide whether a given tweet is subjective or objective
bull Task B decide the tweet polarity with respect to four classes positive negativeneutral and mixed sentiment (both positive and negative)
Dataset
bull Training 4513 manually annotated tweets (495 tweets are not available for thedownload and are removed)
bull Test 1935 manually annotated tweets (1748 available at the time of theevaluation)
bull Metrics systems are compared against the gold standard in terms of F measure
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 11 21
Evaluation Results
Results
Setting Task F Rank Imp
baselineTask A 04005 - -Task B 03718 - -
constrainedTask A 07140 1 78Task B 06771 1 82
unconstrainedTask A 06892 2 72Task B 06638 1 79
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 12 21
Evaluation Discussion
Results Discussion
bull The system always obtains the best task performance in all settings
bull Co-training approach seems to introduce noise
bull Deep error analysis
bull the co-training system slightly improves the performance in classifyingpositive tweets
bull bias in our classifier due to the domain-specific lexicon about politicaltopics (governo Monti crisi)
bull classifier could be improved by enriching our lexicon with jargon andidiomatic expressions
bull ablation test investigate the predictive power of the features in ourmodel
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 13 21
Evaluation Ablation test
Ablation Test
Decrease of F by removing each feature group compared to the complete feature setting
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 14 21
Conclusions and Future Work
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 15 21
Conclusions and Future Work
Conclusions and Future Work
Conclusions
bull The combination of keywordmicro-blogging semantic and lexicon features resultsin best performance
bull Semantic features (distributional approach) have more predictive power
Future Work
bull Validate and generalize our findings further data and languages
bull Fix some issues in co-training approach
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 16 21
Thank you
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 17 21
For Further Reading
For Further Reading I
Barbosa L Feng J Robust sentiment detection on twitter from biased and noisydata
In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 36ndash44 Association for Computational Linguistics Stroudsburg PA USA(2010)
Basile V Bolioli A Nissim M Patti V Rosso P Overview of the Evalita2014 SENTIment POLarity Classification Task
In Proc of EVALITA 2014 Pisa Italy (2014)
Basile V Nissim M Sentiment analysis on italian tweets
In Proc of WASSA 2013 pp 100ndash107 (2013)
Davidov D Tsur O Rappoport A Enhanced sentiment learning using twitterhashtags and smileys
In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 241ndash249 Association for Computational Linguistics Stroudsburg PA USA(2010)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 18 21
For Further Reading
For Further Reading II
Esuli A Sebastiani F Sentiwordnet A publicly available lexical resource foropinion mining
In Proc of LREC pp 417ndash422 (2006)
Go A Bhayani R Huang L Twitter sentiment classification using distantsupervision
Processing pp 1ndash6 (2009)
Jansen BJ Zhang M Sobel K Chowdury A Twitter power Tweets aselectronic word of mouth
J Am Soc Inf Sci Technol 60(11) 2169ndash2188 (2009)
Kouloumpis E Wilson T Moore JD Twitter sentiment analysis The goodthe bad and the omg
In Proc of ICWSM 2011 pp 538ndash541 (2011)
Mikolov T Chen K Corrado G Dean J Efficient estimation of wordrepresentations in vector space
In Proc of ICLR Work (2013)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 19 21
For Further Reading
For Further Reading III
OrsquoConnor B Balasubramanyan R Routledge B Smith N From tweets topolls Linking text sentiment to public opinion time series
In Intl AAAI Conf on Weblogs and Social Media (ICWSM) vol 11 pp 122ndash129(2010)
Pak A Paroubek P Twitter as a corpus for sentiment analysis and opinionmining
In Proc of the Seventh Intl Conf on Language Resources and Evaluation(LRECrsquo10) (2010)
Pang B Lee L Opinion mining and sentiment analysis
Found Trends Inf Retr 2(1-2) 1ndash135 (2008)
Pianta E Bentivogli L Girardi C Multiwordnet developing an alignedmultilingual database
In Proc 1st Intl Conf on Global WordNet pp 293ndash302 (2002)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 20 21
For Further Reading
For Further Reading IV
Smolensky P Tensor product variable binding and the representation of symbolicstructures in connectionist systems
Artificial Intelligence 46(1-2) 159ndash216 (1990)
Vanzo A Croce D Basili R A context-based model for sentiment analysis intwitter
In Proc of COLING 2014 (2014)
Wilson T Wiebe J Hoffmann P Recognizing contextual polarity Anexploration of features for phrase-level sentiment analysis
Comput Linguist 35(3) 399ndash433 (2009)
Zanchetta E Baroni M Morph-it a free corpus-based morphological resourcefor the italian language
Proc of the Corpus Linguistics Conf 2005 (2005)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 21 21
- Introduction
-
- The Task
-
- The System
-
- Our Approach
- Features
-
- Learning
- System Setup
- Evaluation
-
- Results
- Discussion
- Ablation test
-
- Conclusions and Future Work
-
Introduction The Task
Task Overview
EVALITA 2014 SENTIment POLarity Classification (SENTIPOLC)
Subtask A - Subjectivity Classification classify subjectivityobjectivity of the tweetrsquoscontent
La Yamaha e buona solo ad alzar polemiche E la Honda a crearne rarr subj
Domani sera a Palazzo Chigi incontro informale tra Mario Monti e parti sociali I leaderdi Cgil Cisl Uil e Ug http bit ly rNWoB0 rarr obj
Subtask B - Polarity Classification classify tweetrsquos polarity positive negative neutraland tweets expressing both positive and negative sentiment
Maroni La politica del governo Monti e sbagliata rarr neg
Coma iniziare un buon sabato Guardandosi South Park il film rarr pos
Pilot Task Irony Detection
governo monti piu equita o piu equitalia
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 2 21
The System Our Approach
Our Approach
Supervised approach relying on three kinds of features
bull Keywords and micro-blogging properties of tweets
bull Content representation in a distributional semantic model
bull Sentiment lexicon
Our contribution
bull Represent both the tweets and the polarity classes in the word space
bull Automatically develop a sentiment lexicon for the Italian starting formSentiWordNet
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 3 21
The System Features
Building a Sentiment Lexicon for Italian
Automatic development of a sentiment lexicon starting form SentiWordNet
Lexicon-based features
bull features based on sentiment scores of tokens
bull sentiment variation in the tweet
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 4 21
The System Features
Building the Word Space from Unlabeled Tweets
Homogenous representation of both tweets and polarity classes in the word space
Distributional features
bull tweet vectorminusrarrt as semantic feature in training our classifiers
bull similarity betweenminusrarrt and each prototype vector
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 5 21
The System Features
Building the Word Space from Unlabeled Tweets
Homogenous representation of both the tweets and the polarity classes in the word space
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 6 21
The System Features
Tweets downloading
Idea
Using four lexicons extracted from training data for each class (pos neg subj obj)
Probability assigned to each token
P(t|ci ) =t + 1
toti + |V | (1)
Rank tokens in descending order according to the Kullback-Leibler divergence (KLD)
KLD = P(t|cs) lowast logP(t|cs)P(t|co)
(2)
Top terms (50) in the rank for each class are used as seeds for downloading the samenumber of tweets for each lexicon
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 7 21
The System Features
Keywords and Micro-blogging features
Keywords
bull exploit tokens occurring in the tweets (unigrams)
bull replace the user mentions URLs and hashtags with three metatokens ldquo USER rdquoldquo URL rdquo and ldquo TAG rdquo
Micro-blogging
bull the use of upper case and character repetitions
bull positive and negative emoticons
bull informal expressions of laughters (ie sequences of ldquoahrdquo)
bull the presence of exclamation and interrogative marks
bull occurrences of adversative words disjunctive words conclusive words andexplicative words
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 8 21
Learning
Learning step
bull Constrained run only the provided training data can be used but lexicons areallowed extract the features from the training data and run the learning algorithm
bull Unconstrained run additional training data can be included investigate aco-training approach to automatically add new examples to the training set
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 9 21
Learning
Learning step
bull Constrained run only the provided training data can be used but lexicons areallowed extract the features from the training data and run the learning algorithm
bull Unconstrained run additional training data can be included investigate aco-training approach to automatically add new examples to the training set
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 9 21
System Setup
System Setup overview
bull Learning algorithm
bull SVM with the RBF kernelbull C=4 selected after a 10-folds validation on training databull total number of features 12117
bull Completely developed in JAVA
bull Weka library is adopted for learning
bull Tweets are tokenized using ldquoTwitter NLP and Part-of-Speech Taggingrdquo
bull Word space download 10 million tweets using the Twitter Streaming API and
build the space using ldquoword2vecrdquo
bull continuous Bag-of-Words Model (CBOW)bull 200 vector dimensionsbull remove the terms with less than ten occurrences (about 200000 terms
overall)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 10 21
Evaluation
Evaluation
Evaluating systems on their ability in
bull Task A decide whether a given tweet is subjective or objective
bull Task B decide the tweet polarity with respect to four classes positive negativeneutral and mixed sentiment (both positive and negative)
Dataset
bull Training 4513 manually annotated tweets (495 tweets are not available for thedownload and are removed)
bull Test 1935 manually annotated tweets (1748 available at the time of theevaluation)
bull Metrics systems are compared against the gold standard in terms of F measure
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 11 21
Evaluation Results
Results
Setting Task F Rank Imp
baselineTask A 04005 - -Task B 03718 - -
constrainedTask A 07140 1 78Task B 06771 1 82
unconstrainedTask A 06892 2 72Task B 06638 1 79
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 12 21
Evaluation Discussion
Results Discussion
bull The system always obtains the best task performance in all settings
bull Co-training approach seems to introduce noise
bull Deep error analysis
bull the co-training system slightly improves the performance in classifyingpositive tweets
bull bias in our classifier due to the domain-specific lexicon about politicaltopics (governo Monti crisi)
bull classifier could be improved by enriching our lexicon with jargon andidiomatic expressions
bull ablation test investigate the predictive power of the features in ourmodel
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 13 21
Evaluation Ablation test
Ablation Test
Decrease of F by removing each feature group compared to the complete feature setting
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 14 21
Conclusions and Future Work
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 15 21
Conclusions and Future Work
Conclusions and Future Work
Conclusions
bull The combination of keywordmicro-blogging semantic and lexicon features resultsin best performance
bull Semantic features (distributional approach) have more predictive power
Future Work
bull Validate and generalize our findings further data and languages
bull Fix some issues in co-training approach
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 16 21
Thank you
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 17 21
For Further Reading
For Further Reading I
Barbosa L Feng J Robust sentiment detection on twitter from biased and noisydata
In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 36ndash44 Association for Computational Linguistics Stroudsburg PA USA(2010)
Basile V Bolioli A Nissim M Patti V Rosso P Overview of the Evalita2014 SENTIment POLarity Classification Task
In Proc of EVALITA 2014 Pisa Italy (2014)
Basile V Nissim M Sentiment analysis on italian tweets
In Proc of WASSA 2013 pp 100ndash107 (2013)
Davidov D Tsur O Rappoport A Enhanced sentiment learning using twitterhashtags and smileys
In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 241ndash249 Association for Computational Linguistics Stroudsburg PA USA(2010)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 18 21
For Further Reading
For Further Reading II
Esuli A Sebastiani F Sentiwordnet A publicly available lexical resource foropinion mining
In Proc of LREC pp 417ndash422 (2006)
Go A Bhayani R Huang L Twitter sentiment classification using distantsupervision
Processing pp 1ndash6 (2009)
Jansen BJ Zhang M Sobel K Chowdury A Twitter power Tweets aselectronic word of mouth
J Am Soc Inf Sci Technol 60(11) 2169ndash2188 (2009)
Kouloumpis E Wilson T Moore JD Twitter sentiment analysis The goodthe bad and the omg
In Proc of ICWSM 2011 pp 538ndash541 (2011)
Mikolov T Chen K Corrado G Dean J Efficient estimation of wordrepresentations in vector space
In Proc of ICLR Work (2013)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 19 21
For Further Reading
For Further Reading III
OrsquoConnor B Balasubramanyan R Routledge B Smith N From tweets topolls Linking text sentiment to public opinion time series
In Intl AAAI Conf on Weblogs and Social Media (ICWSM) vol 11 pp 122ndash129(2010)
Pak A Paroubek P Twitter as a corpus for sentiment analysis and opinionmining
In Proc of the Seventh Intl Conf on Language Resources and Evaluation(LRECrsquo10) (2010)
Pang B Lee L Opinion mining and sentiment analysis
Found Trends Inf Retr 2(1-2) 1ndash135 (2008)
Pianta E Bentivogli L Girardi C Multiwordnet developing an alignedmultilingual database
In Proc 1st Intl Conf on Global WordNet pp 293ndash302 (2002)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 20 21
For Further Reading
For Further Reading IV
Smolensky P Tensor product variable binding and the representation of symbolicstructures in connectionist systems
Artificial Intelligence 46(1-2) 159ndash216 (1990)
Vanzo A Croce D Basili R A context-based model for sentiment analysis intwitter
In Proc of COLING 2014 (2014)
Wilson T Wiebe J Hoffmann P Recognizing contextual polarity Anexploration of features for phrase-level sentiment analysis
Comput Linguist 35(3) 399ndash433 (2009)
Zanchetta E Baroni M Morph-it a free corpus-based morphological resourcefor the italian language
Proc of the Corpus Linguistics Conf 2005 (2005)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 21 21
- Introduction
-
- The Task
-
- The System
-
- Our Approach
- Features
-
- Learning
- System Setup
- Evaluation
-
- Results
- Discussion
- Ablation test
-
- Conclusions and Future Work
-
The System Our Approach
Our Approach
Supervised approach relying on three kinds of features
bull Keywords and micro-blogging properties of tweets
bull Content representation in a distributional semantic model
bull Sentiment lexicon
Our contribution
bull Represent both the tweets and the polarity classes in the word space
bull Automatically develop a sentiment lexicon for the Italian starting formSentiWordNet
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 3 21
The System Features
Building a Sentiment Lexicon for Italian
Automatic development of a sentiment lexicon starting form SentiWordNet
Lexicon-based features
bull features based on sentiment scores of tokens
bull sentiment variation in the tweet
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 4 21
The System Features
Building the Word Space from Unlabeled Tweets
Homogenous representation of both tweets and polarity classes in the word space
Distributional features
bull tweet vectorminusrarrt as semantic feature in training our classifiers
bull similarity betweenminusrarrt and each prototype vector
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 5 21
The System Features
Building the Word Space from Unlabeled Tweets
Homogenous representation of both the tweets and the polarity classes in the word space
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 6 21
The System Features
Tweets downloading
Idea
Using four lexicons extracted from training data for each class (pos neg subj obj)
Probability assigned to each token
P(t|ci ) =t + 1
toti + |V | (1)
Rank tokens in descending order according to the Kullback-Leibler divergence (KLD)
KLD = P(t|cs) lowast logP(t|cs)P(t|co)
(2)
Top terms (50) in the rank for each class are used as seeds for downloading the samenumber of tweets for each lexicon
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 7 21
The System Features
Keywords and Micro-blogging features
Keywords
bull exploit tokens occurring in the tweets (unigrams)
bull replace the user mentions URLs and hashtags with three metatokens ldquo USER rdquoldquo URL rdquo and ldquo TAG rdquo
Micro-blogging
bull the use of upper case and character repetitions
bull positive and negative emoticons
bull informal expressions of laughters (ie sequences of ldquoahrdquo)
bull the presence of exclamation and interrogative marks
bull occurrences of adversative words disjunctive words conclusive words andexplicative words
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 8 21
Learning
Learning step
bull Constrained run only the provided training data can be used but lexicons areallowed extract the features from the training data and run the learning algorithm
bull Unconstrained run additional training data can be included investigate aco-training approach to automatically add new examples to the training set
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 9 21
Learning
Learning step
bull Constrained run only the provided training data can be used but lexicons areallowed extract the features from the training data and run the learning algorithm
bull Unconstrained run additional training data can be included investigate aco-training approach to automatically add new examples to the training set
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 9 21
System Setup
System Setup overview
bull Learning algorithm
bull SVM with the RBF kernelbull C=4 selected after a 10-folds validation on training databull total number of features 12117
bull Completely developed in JAVA
bull Weka library is adopted for learning
bull Tweets are tokenized using ldquoTwitter NLP and Part-of-Speech Taggingrdquo
bull Word space download 10 million tweets using the Twitter Streaming API and
build the space using ldquoword2vecrdquo
bull continuous Bag-of-Words Model (CBOW)bull 200 vector dimensionsbull remove the terms with less than ten occurrences (about 200000 terms
overall)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 10 21
Evaluation
Evaluation
Evaluating systems on their ability in
bull Task A decide whether a given tweet is subjective or objective
bull Task B decide the tweet polarity with respect to four classes positive negativeneutral and mixed sentiment (both positive and negative)
Dataset
bull Training 4513 manually annotated tweets (495 tweets are not available for thedownload and are removed)
bull Test 1935 manually annotated tweets (1748 available at the time of theevaluation)
bull Metrics systems are compared against the gold standard in terms of F measure
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 11 21
Evaluation Results
Results
Setting Task F Rank Imp
baselineTask A 04005 - -Task B 03718 - -
constrainedTask A 07140 1 78Task B 06771 1 82
unconstrainedTask A 06892 2 72Task B 06638 1 79
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 12 21
Evaluation Discussion
Results Discussion
bull The system always obtains the best task performance in all settings
bull Co-training approach seems to introduce noise
bull Deep error analysis
bull the co-training system slightly improves the performance in classifyingpositive tweets
bull bias in our classifier due to the domain-specific lexicon about politicaltopics (governo Monti crisi)
bull classifier could be improved by enriching our lexicon with jargon andidiomatic expressions
bull ablation test investigate the predictive power of the features in ourmodel
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 13 21
Evaluation Ablation test
Ablation Test
Decrease of F by removing each feature group compared to the complete feature setting
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 14 21
Conclusions and Future Work
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 15 21
Conclusions and Future Work
Conclusions and Future Work
Conclusions
bull The combination of keywordmicro-blogging semantic and lexicon features resultsin best performance
bull Semantic features (distributional approach) have more predictive power
Future Work
bull Validate and generalize our findings further data and languages
bull Fix some issues in co-training approach
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 16 21
Thank you
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 17 21
For Further Reading
For Further Reading I
Barbosa L Feng J Robust sentiment detection on twitter from biased and noisydata
In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 36ndash44 Association for Computational Linguistics Stroudsburg PA USA(2010)
Basile V Bolioli A Nissim M Patti V Rosso P Overview of the Evalita2014 SENTIment POLarity Classification Task
In Proc of EVALITA 2014 Pisa Italy (2014)
Basile V Nissim M Sentiment analysis on italian tweets
In Proc of WASSA 2013 pp 100ndash107 (2013)
Davidov D Tsur O Rappoport A Enhanced sentiment learning using twitterhashtags and smileys
In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 241ndash249 Association for Computational Linguistics Stroudsburg PA USA(2010)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 18 21
For Further Reading
For Further Reading II
Esuli A Sebastiani F Sentiwordnet A publicly available lexical resource foropinion mining
In Proc of LREC pp 417ndash422 (2006)
Go A Bhayani R Huang L Twitter sentiment classification using distantsupervision
Processing pp 1ndash6 (2009)
Jansen BJ Zhang M Sobel K Chowdury A Twitter power Tweets aselectronic word of mouth
J Am Soc Inf Sci Technol 60(11) 2169ndash2188 (2009)
Kouloumpis E Wilson T Moore JD Twitter sentiment analysis The goodthe bad and the omg
In Proc of ICWSM 2011 pp 538ndash541 (2011)
Mikolov T Chen K Corrado G Dean J Efficient estimation of wordrepresentations in vector space
In Proc of ICLR Work (2013)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 19 21
For Further Reading
For Further Reading III
OrsquoConnor B Balasubramanyan R Routledge B Smith N From tweets topolls Linking text sentiment to public opinion time series
In Intl AAAI Conf on Weblogs and Social Media (ICWSM) vol 11 pp 122ndash129(2010)
Pak A Paroubek P Twitter as a corpus for sentiment analysis and opinionmining
In Proc of the Seventh Intl Conf on Language Resources and Evaluation(LRECrsquo10) (2010)
Pang B Lee L Opinion mining and sentiment analysis
Found Trends Inf Retr 2(1-2) 1ndash135 (2008)
Pianta E Bentivogli L Girardi C Multiwordnet developing an alignedmultilingual database
In Proc 1st Intl Conf on Global WordNet pp 293ndash302 (2002)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 20 21
For Further Reading
For Further Reading IV
Smolensky P Tensor product variable binding and the representation of symbolicstructures in connectionist systems
Artificial Intelligence 46(1-2) 159ndash216 (1990)
Vanzo A Croce D Basili R A context-based model for sentiment analysis intwitter
In Proc of COLING 2014 (2014)
Wilson T Wiebe J Hoffmann P Recognizing contextual polarity Anexploration of features for phrase-level sentiment analysis
Comput Linguist 35(3) 399ndash433 (2009)
Zanchetta E Baroni M Morph-it a free corpus-based morphological resourcefor the italian language
Proc of the Corpus Linguistics Conf 2005 (2005)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 21 21
- Introduction
-
- The Task
-
- The System
-
- Our Approach
- Features
-
- Learning
- System Setup
- Evaluation
-
- Results
- Discussion
- Ablation test
-
- Conclusions and Future Work
-
The System Features
Building a Sentiment Lexicon for Italian
Automatic development of a sentiment lexicon starting form SentiWordNet
Lexicon-based features
bull features based on sentiment scores of tokens
bull sentiment variation in the tweet
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 4 21
The System Features
Building the Word Space from Unlabeled Tweets
Homogenous representation of both tweets and polarity classes in the word space
Distributional features
bull tweet vectorminusrarrt as semantic feature in training our classifiers
bull similarity betweenminusrarrt and each prototype vector
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 5 21
The System Features
Building the Word Space from Unlabeled Tweets
Homogenous representation of both the tweets and the polarity classes in the word space
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 6 21
The System Features
Tweets downloading
Idea
Using four lexicons extracted from training data for each class (pos neg subj obj)
Probability assigned to each token
P(t|ci ) =t + 1
toti + |V | (1)
Rank tokens in descending order according to the Kullback-Leibler divergence (KLD)
KLD = P(t|cs) lowast logP(t|cs)P(t|co)
(2)
Top terms (50) in the rank for each class are used as seeds for downloading the samenumber of tweets for each lexicon
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 7 21
The System Features
Keywords and Micro-blogging features
Keywords
bull exploit tokens occurring in the tweets (unigrams)
bull replace the user mentions URLs and hashtags with three metatokens ldquo USER rdquoldquo URL rdquo and ldquo TAG rdquo
Micro-blogging
bull the use of upper case and character repetitions
bull positive and negative emoticons
bull informal expressions of laughters (ie sequences of ldquoahrdquo)
bull the presence of exclamation and interrogative marks
bull occurrences of adversative words disjunctive words conclusive words andexplicative words
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 8 21
Learning
Learning step
bull Constrained run only the provided training data can be used but lexicons areallowed extract the features from the training data and run the learning algorithm
bull Unconstrained run additional training data can be included investigate aco-training approach to automatically add new examples to the training set
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 9 21
Learning
Learning step
bull Constrained run only the provided training data can be used but lexicons areallowed extract the features from the training data and run the learning algorithm
bull Unconstrained run additional training data can be included investigate aco-training approach to automatically add new examples to the training set
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 9 21
System Setup
System Setup overview
bull Learning algorithm
bull SVM with the RBF kernelbull C=4 selected after a 10-folds validation on training databull total number of features 12117
bull Completely developed in JAVA
bull Weka library is adopted for learning
bull Tweets are tokenized using ldquoTwitter NLP and Part-of-Speech Taggingrdquo
bull Word space download 10 million tweets using the Twitter Streaming API and
build the space using ldquoword2vecrdquo
bull continuous Bag-of-Words Model (CBOW)bull 200 vector dimensionsbull remove the terms with less than ten occurrences (about 200000 terms
overall)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 10 21
Evaluation
Evaluation
Evaluating systems on their ability in
bull Task A decide whether a given tweet is subjective or objective
bull Task B decide the tweet polarity with respect to four classes positive negativeneutral and mixed sentiment (both positive and negative)
Dataset
bull Training 4513 manually annotated tweets (495 tweets are not available for thedownload and are removed)
bull Test 1935 manually annotated tweets (1748 available at the time of theevaluation)
bull Metrics systems are compared against the gold standard in terms of F measure
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 11 21
Evaluation Results
Results
Setting Task F Rank Imp
baselineTask A 04005 - -Task B 03718 - -
constrainedTask A 07140 1 78Task B 06771 1 82
unconstrainedTask A 06892 2 72Task B 06638 1 79
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 12 21
Evaluation Discussion
Results Discussion
bull The system always obtains the best task performance in all settings
bull Co-training approach seems to introduce noise
bull Deep error analysis
bull the co-training system slightly improves the performance in classifyingpositive tweets
bull bias in our classifier due to the domain-specific lexicon about politicaltopics (governo Monti crisi)
bull classifier could be improved by enriching our lexicon with jargon andidiomatic expressions
bull ablation test investigate the predictive power of the features in ourmodel
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 13 21
Evaluation Ablation test
Ablation Test
Decrease of F by removing each feature group compared to the complete feature setting
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 14 21
Conclusions and Future Work
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 15 21
Conclusions and Future Work
Conclusions and Future Work
Conclusions
bull The combination of keywordmicro-blogging semantic and lexicon features resultsin best performance
bull Semantic features (distributional approach) have more predictive power
Future Work
bull Validate and generalize our findings further data and languages
bull Fix some issues in co-training approach
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 16 21
Thank you
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 17 21
For Further Reading
For Further Reading I
Barbosa L Feng J Robust sentiment detection on twitter from biased and noisydata
In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 36ndash44 Association for Computational Linguistics Stroudsburg PA USA(2010)
Basile V Bolioli A Nissim M Patti V Rosso P Overview of the Evalita2014 SENTIment POLarity Classification Task
In Proc of EVALITA 2014 Pisa Italy (2014)
Basile V Nissim M Sentiment analysis on italian tweets
In Proc of WASSA 2013 pp 100ndash107 (2013)
Davidov D Tsur O Rappoport A Enhanced sentiment learning using twitterhashtags and smileys
In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 241ndash249 Association for Computational Linguistics Stroudsburg PA USA(2010)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 18 21
For Further Reading
For Further Reading II
Esuli A Sebastiani F Sentiwordnet A publicly available lexical resource foropinion mining
In Proc of LREC pp 417ndash422 (2006)
Go A Bhayani R Huang L Twitter sentiment classification using distantsupervision
Processing pp 1ndash6 (2009)
Jansen BJ Zhang M Sobel K Chowdury A Twitter power Tweets aselectronic word of mouth
J Am Soc Inf Sci Technol 60(11) 2169ndash2188 (2009)
Kouloumpis E Wilson T Moore JD Twitter sentiment analysis The goodthe bad and the omg
In Proc of ICWSM 2011 pp 538ndash541 (2011)
Mikolov T Chen K Corrado G Dean J Efficient estimation of wordrepresentations in vector space
In Proc of ICLR Work (2013)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 19 21
For Further Reading
For Further Reading III
OrsquoConnor B Balasubramanyan R Routledge B Smith N From tweets topolls Linking text sentiment to public opinion time series
In Intl AAAI Conf on Weblogs and Social Media (ICWSM) vol 11 pp 122ndash129(2010)
Pak A Paroubek P Twitter as a corpus for sentiment analysis and opinionmining
In Proc of the Seventh Intl Conf on Language Resources and Evaluation(LRECrsquo10) (2010)
Pang B Lee L Opinion mining and sentiment analysis
Found Trends Inf Retr 2(1-2) 1ndash135 (2008)
Pianta E Bentivogli L Girardi C Multiwordnet developing an alignedmultilingual database
In Proc 1st Intl Conf on Global WordNet pp 293ndash302 (2002)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 20 21
For Further Reading
For Further Reading IV
Smolensky P Tensor product variable binding and the representation of symbolicstructures in connectionist systems
Artificial Intelligence 46(1-2) 159ndash216 (1990)
Vanzo A Croce D Basili R A context-based model for sentiment analysis intwitter
In Proc of COLING 2014 (2014)
Wilson T Wiebe J Hoffmann P Recognizing contextual polarity Anexploration of features for phrase-level sentiment analysis
Comput Linguist 35(3) 399ndash433 (2009)
Zanchetta E Baroni M Morph-it a free corpus-based morphological resourcefor the italian language
Proc of the Corpus Linguistics Conf 2005 (2005)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 21 21
- Introduction
-
- The Task
-
- The System
-
- Our Approach
- Features
-
- Learning
- System Setup
- Evaluation
-
- Results
- Discussion
- Ablation test
-
- Conclusions and Future Work
-
The System Features
Building the Word Space from Unlabeled Tweets
Homogenous representation of both tweets and polarity classes in the word space
Distributional features
bull tweet vectorminusrarrt as semantic feature in training our classifiers
bull similarity betweenminusrarrt and each prototype vector
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 5 21
The System Features
Building the Word Space from Unlabeled Tweets
Homogenous representation of both the tweets and the polarity classes in the word space
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 6 21
The System Features
Tweets downloading
Idea
Using four lexicons extracted from training data for each class (pos neg subj obj)
Probability assigned to each token
P(t|ci ) =t + 1
toti + |V | (1)
Rank tokens in descending order according to the Kullback-Leibler divergence (KLD)
KLD = P(t|cs) lowast logP(t|cs)P(t|co)
(2)
Top terms (50) in the rank for each class are used as seeds for downloading the samenumber of tweets for each lexicon
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 7 21
The System Features
Keywords and Micro-blogging features
Keywords
bull exploit tokens occurring in the tweets (unigrams)
bull replace the user mentions URLs and hashtags with three metatokens ldquo USER rdquoldquo URL rdquo and ldquo TAG rdquo
Micro-blogging
bull the use of upper case and character repetitions
bull positive and negative emoticons
bull informal expressions of laughters (ie sequences of ldquoahrdquo)
bull the presence of exclamation and interrogative marks
bull occurrences of adversative words disjunctive words conclusive words andexplicative words
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 8 21
Learning
Learning step
bull Constrained run only the provided training data can be used but lexicons areallowed extract the features from the training data and run the learning algorithm
bull Unconstrained run additional training data can be included investigate aco-training approach to automatically add new examples to the training set
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 9 21
Learning
Learning step
bull Constrained run only the provided training data can be used but lexicons areallowed extract the features from the training data and run the learning algorithm
bull Unconstrained run additional training data can be included investigate aco-training approach to automatically add new examples to the training set
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 9 21
System Setup
System Setup overview
bull Learning algorithm
bull SVM with the RBF kernelbull C=4 selected after a 10-folds validation on training databull total number of features 12117
bull Completely developed in JAVA
bull Weka library is adopted for learning
bull Tweets are tokenized using ldquoTwitter NLP and Part-of-Speech Taggingrdquo
bull Word space download 10 million tweets using the Twitter Streaming API and
build the space using ldquoword2vecrdquo
bull continuous Bag-of-Words Model (CBOW)bull 200 vector dimensionsbull remove the terms with less than ten occurrences (about 200000 terms
overall)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 10 21
Evaluation
Evaluation
Evaluating systems on their ability in
bull Task A decide whether a given tweet is subjective or objective
bull Task B decide the tweet polarity with respect to four classes positive negativeneutral and mixed sentiment (both positive and negative)
Dataset
bull Training 4513 manually annotated tweets (495 tweets are not available for thedownload and are removed)
bull Test 1935 manually annotated tweets (1748 available at the time of theevaluation)
bull Metrics systems are compared against the gold standard in terms of F measure
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 11 21
Evaluation Results
Results
Setting Task F Rank Imp
baselineTask A 04005 - -Task B 03718 - -
constrainedTask A 07140 1 78Task B 06771 1 82
unconstrainedTask A 06892 2 72Task B 06638 1 79
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 12 21
Evaluation Discussion
Results Discussion
bull The system always obtains the best task performance in all settings
bull Co-training approach seems to introduce noise
bull Deep error analysis
bull the co-training system slightly improves the performance in classifyingpositive tweets
bull bias in our classifier due to the domain-specific lexicon about politicaltopics (governo Monti crisi)
bull classifier could be improved by enriching our lexicon with jargon andidiomatic expressions
bull ablation test investigate the predictive power of the features in ourmodel
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 13 21
Evaluation Ablation test
Ablation Test
Decrease of F by removing each feature group compared to the complete feature setting
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 14 21
Conclusions and Future Work
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 15 21
Conclusions and Future Work
Conclusions and Future Work
Conclusions
bull The combination of keywordmicro-blogging semantic and lexicon features resultsin best performance
bull Semantic features (distributional approach) have more predictive power
Future Work
bull Validate and generalize our findings further data and languages
bull Fix some issues in co-training approach
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 16 21
Thank you
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 17 21
For Further Reading
For Further Reading I
Barbosa L Feng J Robust sentiment detection on twitter from biased and noisydata
In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 36ndash44 Association for Computational Linguistics Stroudsburg PA USA(2010)
Basile V Bolioli A Nissim M Patti V Rosso P Overview of the Evalita2014 SENTIment POLarity Classification Task
In Proc of EVALITA 2014 Pisa Italy (2014)
Basile V Nissim M Sentiment analysis on italian tweets
In Proc of WASSA 2013 pp 100ndash107 (2013)
Davidov D Tsur O Rappoport A Enhanced sentiment learning using twitterhashtags and smileys
In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 241ndash249 Association for Computational Linguistics Stroudsburg PA USA(2010)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 18 21
For Further Reading
For Further Reading II
Esuli A Sebastiani F Sentiwordnet A publicly available lexical resource foropinion mining
In Proc of LREC pp 417ndash422 (2006)
Go A Bhayani R Huang L Twitter sentiment classification using distantsupervision
Processing pp 1ndash6 (2009)
Jansen BJ Zhang M Sobel K Chowdury A Twitter power Tweets aselectronic word of mouth
J Am Soc Inf Sci Technol 60(11) 2169ndash2188 (2009)
Kouloumpis E Wilson T Moore JD Twitter sentiment analysis The goodthe bad and the omg
In Proc of ICWSM 2011 pp 538ndash541 (2011)
Mikolov T Chen K Corrado G Dean J Efficient estimation of wordrepresentations in vector space
In Proc of ICLR Work (2013)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 19 21
For Further Reading
For Further Reading III
OrsquoConnor B Balasubramanyan R Routledge B Smith N From tweets topolls Linking text sentiment to public opinion time series
In Intl AAAI Conf on Weblogs and Social Media (ICWSM) vol 11 pp 122ndash129(2010)
Pak A Paroubek P Twitter as a corpus for sentiment analysis and opinionmining
In Proc of the Seventh Intl Conf on Language Resources and Evaluation(LRECrsquo10) (2010)
Pang B Lee L Opinion mining and sentiment analysis
Found Trends Inf Retr 2(1-2) 1ndash135 (2008)
Pianta E Bentivogli L Girardi C Multiwordnet developing an alignedmultilingual database
In Proc 1st Intl Conf on Global WordNet pp 293ndash302 (2002)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 20 21
For Further Reading
For Further Reading IV
Smolensky P Tensor product variable binding and the representation of symbolicstructures in connectionist systems
Artificial Intelligence 46(1-2) 159ndash216 (1990)
Vanzo A Croce D Basili R A context-based model for sentiment analysis intwitter
In Proc of COLING 2014 (2014)
Wilson T Wiebe J Hoffmann P Recognizing contextual polarity Anexploration of features for phrase-level sentiment analysis
Comput Linguist 35(3) 399ndash433 (2009)
Zanchetta E Baroni M Morph-it a free corpus-based morphological resourcefor the italian language
Proc of the Corpus Linguistics Conf 2005 (2005)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 21 21
- Introduction
-
- The Task
-
- The System
-
- Our Approach
- Features
-
- Learning
- System Setup
- Evaluation
-
- Results
- Discussion
- Ablation test
-
- Conclusions and Future Work
-
The System Features
Building the Word Space from Unlabeled Tweets
Homogenous representation of both the tweets and the polarity classes in the word space
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 6 21
The System Features
Tweets downloading
Idea
Using four lexicons extracted from training data for each class (pos neg subj obj)
Probability assigned to each token
P(t|ci ) =t + 1
toti + |V | (1)
Rank tokens in descending order according to the Kullback-Leibler divergence (KLD)
KLD = P(t|cs) lowast logP(t|cs)P(t|co)
(2)
Top terms (50) in the rank for each class are used as seeds for downloading the samenumber of tweets for each lexicon
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 7 21
The System Features
Keywords and Micro-blogging features
Keywords
bull exploit tokens occurring in the tweets (unigrams)
bull replace the user mentions URLs and hashtags with three metatokens ldquo USER rdquoldquo URL rdquo and ldquo TAG rdquo
Micro-blogging
bull the use of upper case and character repetitions
bull positive and negative emoticons
bull informal expressions of laughters (ie sequences of ldquoahrdquo)
bull the presence of exclamation and interrogative marks
bull occurrences of adversative words disjunctive words conclusive words andexplicative words
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 8 21
Learning
Learning step
bull Constrained run only the provided training data can be used but lexicons areallowed extract the features from the training data and run the learning algorithm
bull Unconstrained run additional training data can be included investigate aco-training approach to automatically add new examples to the training set
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 9 21
Learning
Learning step
bull Constrained run only the provided training data can be used but lexicons areallowed extract the features from the training data and run the learning algorithm
bull Unconstrained run additional training data can be included investigate aco-training approach to automatically add new examples to the training set
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 9 21
System Setup
System Setup overview
bull Learning algorithm
bull SVM with the RBF kernelbull C=4 selected after a 10-folds validation on training databull total number of features 12117
bull Completely developed in JAVA
bull Weka library is adopted for learning
bull Tweets are tokenized using ldquoTwitter NLP and Part-of-Speech Taggingrdquo
bull Word space download 10 million tweets using the Twitter Streaming API and
build the space using ldquoword2vecrdquo
bull continuous Bag-of-Words Model (CBOW)bull 200 vector dimensionsbull remove the terms with less than ten occurrences (about 200000 terms
overall)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 10 21
Evaluation
Evaluation
Evaluating systems on their ability in
bull Task A decide whether a given tweet is subjective or objective
bull Task B decide the tweet polarity with respect to four classes positive negativeneutral and mixed sentiment (both positive and negative)
Dataset
bull Training 4513 manually annotated tweets (495 tweets are not available for thedownload and are removed)
bull Test 1935 manually annotated tweets (1748 available at the time of theevaluation)
bull Metrics systems are compared against the gold standard in terms of F measure
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 11 21
Evaluation Results
Results
Setting Task F Rank Imp
baselineTask A 04005 - -Task B 03718 - -
constrainedTask A 07140 1 78Task B 06771 1 82
unconstrainedTask A 06892 2 72Task B 06638 1 79
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 12 21
Evaluation Discussion
Results Discussion
bull The system always obtains the best task performance in all settings
bull Co-training approach seems to introduce noise
bull Deep error analysis
bull the co-training system slightly improves the performance in classifyingpositive tweets
bull bias in our classifier due to the domain-specific lexicon about politicaltopics (governo Monti crisi)
bull classifier could be improved by enriching our lexicon with jargon andidiomatic expressions
bull ablation test investigate the predictive power of the features in ourmodel
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 13 21
Evaluation Ablation test
Ablation Test
Decrease of F by removing each feature group compared to the complete feature setting
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 14 21
Conclusions and Future Work
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 15 21
Conclusions and Future Work
Conclusions and Future Work
Conclusions
bull The combination of keywordmicro-blogging semantic and lexicon features resultsin best performance
bull Semantic features (distributional approach) have more predictive power
Future Work
bull Validate and generalize our findings further data and languages
bull Fix some issues in co-training approach
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 16 21
Thank you
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 17 21
For Further Reading
For Further Reading I
Barbosa L Feng J Robust sentiment detection on twitter from biased and noisydata
In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 36ndash44 Association for Computational Linguistics Stroudsburg PA USA(2010)
Basile V Bolioli A Nissim M Patti V Rosso P Overview of the Evalita2014 SENTIment POLarity Classification Task
In Proc of EVALITA 2014 Pisa Italy (2014)
Basile V Nissim M Sentiment analysis on italian tweets
In Proc of WASSA 2013 pp 100ndash107 (2013)
Davidov D Tsur O Rappoport A Enhanced sentiment learning using twitterhashtags and smileys
In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 241ndash249 Association for Computational Linguistics Stroudsburg PA USA(2010)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 18 21
For Further Reading
For Further Reading II
Esuli A Sebastiani F Sentiwordnet A publicly available lexical resource foropinion mining
In Proc of LREC pp 417ndash422 (2006)
Go A Bhayani R Huang L Twitter sentiment classification using distantsupervision
Processing pp 1ndash6 (2009)
Jansen BJ Zhang M Sobel K Chowdury A Twitter power Tweets aselectronic word of mouth
J Am Soc Inf Sci Technol 60(11) 2169ndash2188 (2009)
Kouloumpis E Wilson T Moore JD Twitter sentiment analysis The goodthe bad and the omg
In Proc of ICWSM 2011 pp 538ndash541 (2011)
Mikolov T Chen K Corrado G Dean J Efficient estimation of wordrepresentations in vector space
In Proc of ICLR Work (2013)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 19 21
For Further Reading
For Further Reading III
OrsquoConnor B Balasubramanyan R Routledge B Smith N From tweets topolls Linking text sentiment to public opinion time series
In Intl AAAI Conf on Weblogs and Social Media (ICWSM) vol 11 pp 122ndash129(2010)
Pak A Paroubek P Twitter as a corpus for sentiment analysis and opinionmining
In Proc of the Seventh Intl Conf on Language Resources and Evaluation(LRECrsquo10) (2010)
Pang B Lee L Opinion mining and sentiment analysis
Found Trends Inf Retr 2(1-2) 1ndash135 (2008)
Pianta E Bentivogli L Girardi C Multiwordnet developing an alignedmultilingual database
In Proc 1st Intl Conf on Global WordNet pp 293ndash302 (2002)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 20 21
For Further Reading
For Further Reading IV
Smolensky P Tensor product variable binding and the representation of symbolicstructures in connectionist systems
Artificial Intelligence 46(1-2) 159ndash216 (1990)
Vanzo A Croce D Basili R A context-based model for sentiment analysis intwitter
In Proc of COLING 2014 (2014)
Wilson T Wiebe J Hoffmann P Recognizing contextual polarity Anexploration of features for phrase-level sentiment analysis
Comput Linguist 35(3) 399ndash433 (2009)
Zanchetta E Baroni M Morph-it a free corpus-based morphological resourcefor the italian language
Proc of the Corpus Linguistics Conf 2005 (2005)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 21 21
- Introduction
-
- The Task
-
- The System
-
- Our Approach
- Features
-
- Learning
- System Setup
- Evaluation
-
- Results
- Discussion
- Ablation test
-
- Conclusions and Future Work
-
The System Features
Tweets downloading
Idea
Using four lexicons extracted from training data for each class (pos neg subj obj)
Probability assigned to each token
P(t|ci ) =t + 1
toti + |V | (1)
Rank tokens in descending order according to the Kullback-Leibler divergence (KLD)
KLD = P(t|cs) lowast logP(t|cs)P(t|co)
(2)
Top terms (50) in the rank for each class are used as seeds for downloading the samenumber of tweets for each lexicon
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 7 21
The System Features
Keywords and Micro-blogging features
Keywords
bull exploit tokens occurring in the tweets (unigrams)
bull replace the user mentions URLs and hashtags with three metatokens ldquo USER rdquoldquo URL rdquo and ldquo TAG rdquo
Micro-blogging
bull the use of upper case and character repetitions
bull positive and negative emoticons
bull informal expressions of laughters (ie sequences of ldquoahrdquo)
bull the presence of exclamation and interrogative marks
bull occurrences of adversative words disjunctive words conclusive words andexplicative words
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 8 21
Learning
Learning step
bull Constrained run only the provided training data can be used but lexicons areallowed extract the features from the training data and run the learning algorithm
bull Unconstrained run additional training data can be included investigate aco-training approach to automatically add new examples to the training set
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 9 21
Learning
Learning step
bull Constrained run only the provided training data can be used but lexicons areallowed extract the features from the training data and run the learning algorithm
bull Unconstrained run additional training data can be included investigate aco-training approach to automatically add new examples to the training set
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 9 21
System Setup
System Setup overview
bull Learning algorithm
bull SVM with the RBF kernelbull C=4 selected after a 10-folds validation on training databull total number of features 12117
bull Completely developed in JAVA
bull Weka library is adopted for learning
bull Tweets are tokenized using ldquoTwitter NLP and Part-of-Speech Taggingrdquo
bull Word space download 10 million tweets using the Twitter Streaming API and
build the space using ldquoword2vecrdquo
bull continuous Bag-of-Words Model (CBOW)bull 200 vector dimensionsbull remove the terms with less than ten occurrences (about 200000 terms
overall)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 10 21
Evaluation
Evaluation
Evaluating systems on their ability in
bull Task A decide whether a given tweet is subjective or objective
bull Task B decide the tweet polarity with respect to four classes positive negativeneutral and mixed sentiment (both positive and negative)
Dataset
bull Training 4513 manually annotated tweets (495 tweets are not available for thedownload and are removed)
bull Test 1935 manually annotated tweets (1748 available at the time of theevaluation)
bull Metrics systems are compared against the gold standard in terms of F measure
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 11 21
Evaluation Results
Results
Setting Task F Rank Imp
baselineTask A 04005 - -Task B 03718 - -
constrainedTask A 07140 1 78Task B 06771 1 82
unconstrainedTask A 06892 2 72Task B 06638 1 79
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 12 21
Evaluation Discussion
Results Discussion
bull The system always obtains the best task performance in all settings
bull Co-training approach seems to introduce noise
bull Deep error analysis
bull the co-training system slightly improves the performance in classifyingpositive tweets
bull bias in our classifier due to the domain-specific lexicon about politicaltopics (governo Monti crisi)
bull classifier could be improved by enriching our lexicon with jargon andidiomatic expressions
bull ablation test investigate the predictive power of the features in ourmodel
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 13 21
Evaluation Ablation test
Ablation Test
Decrease of F by removing each feature group compared to the complete feature setting
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 14 21
Conclusions and Future Work
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 15 21
Conclusions and Future Work
Conclusions and Future Work
Conclusions
bull The combination of keywordmicro-blogging semantic and lexicon features resultsin best performance
bull Semantic features (distributional approach) have more predictive power
Future Work
bull Validate and generalize our findings further data and languages
bull Fix some issues in co-training approach
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 16 21
Thank you
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 17 21
For Further Reading
For Further Reading I
Barbosa L Feng J Robust sentiment detection on twitter from biased and noisydata
In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 36ndash44 Association for Computational Linguistics Stroudsburg PA USA(2010)
Basile V Bolioli A Nissim M Patti V Rosso P Overview of the Evalita2014 SENTIment POLarity Classification Task
In Proc of EVALITA 2014 Pisa Italy (2014)
Basile V Nissim M Sentiment analysis on italian tweets
In Proc of WASSA 2013 pp 100ndash107 (2013)
Davidov D Tsur O Rappoport A Enhanced sentiment learning using twitterhashtags and smileys
In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 241ndash249 Association for Computational Linguistics Stroudsburg PA USA(2010)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 18 21
For Further Reading
For Further Reading II
Esuli A Sebastiani F Sentiwordnet A publicly available lexical resource foropinion mining
In Proc of LREC pp 417ndash422 (2006)
Go A Bhayani R Huang L Twitter sentiment classification using distantsupervision
Processing pp 1ndash6 (2009)
Jansen BJ Zhang M Sobel K Chowdury A Twitter power Tweets aselectronic word of mouth
J Am Soc Inf Sci Technol 60(11) 2169ndash2188 (2009)
Kouloumpis E Wilson T Moore JD Twitter sentiment analysis The goodthe bad and the omg
In Proc of ICWSM 2011 pp 538ndash541 (2011)
Mikolov T Chen K Corrado G Dean J Efficient estimation of wordrepresentations in vector space
In Proc of ICLR Work (2013)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 19 21
For Further Reading
For Further Reading III
OrsquoConnor B Balasubramanyan R Routledge B Smith N From tweets topolls Linking text sentiment to public opinion time series
In Intl AAAI Conf on Weblogs and Social Media (ICWSM) vol 11 pp 122ndash129(2010)
Pak A Paroubek P Twitter as a corpus for sentiment analysis and opinionmining
In Proc of the Seventh Intl Conf on Language Resources and Evaluation(LRECrsquo10) (2010)
Pang B Lee L Opinion mining and sentiment analysis
Found Trends Inf Retr 2(1-2) 1ndash135 (2008)
Pianta E Bentivogli L Girardi C Multiwordnet developing an alignedmultilingual database
In Proc 1st Intl Conf on Global WordNet pp 293ndash302 (2002)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 20 21
For Further Reading
For Further Reading IV
Smolensky P Tensor product variable binding and the representation of symbolicstructures in connectionist systems
Artificial Intelligence 46(1-2) 159ndash216 (1990)
Vanzo A Croce D Basili R A context-based model for sentiment analysis intwitter
In Proc of COLING 2014 (2014)
Wilson T Wiebe J Hoffmann P Recognizing contextual polarity Anexploration of features for phrase-level sentiment analysis
Comput Linguist 35(3) 399ndash433 (2009)
Zanchetta E Baroni M Morph-it a free corpus-based morphological resourcefor the italian language
Proc of the Corpus Linguistics Conf 2005 (2005)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 21 21
- Introduction
-
- The Task
-
- The System
-
- Our Approach
- Features
-
- Learning
- System Setup
- Evaluation
-
- Results
- Discussion
- Ablation test
-
- Conclusions and Future Work
-
The System Features
Keywords and Micro-blogging features
Keywords
bull exploit tokens occurring in the tweets (unigrams)
bull replace the user mentions URLs and hashtags with three metatokens ldquo USER rdquoldquo URL rdquo and ldquo TAG rdquo
Micro-blogging
bull the use of upper case and character repetitions
bull positive and negative emoticons
bull informal expressions of laughters (ie sequences of ldquoahrdquo)
bull the presence of exclamation and interrogative marks
bull occurrences of adversative words disjunctive words conclusive words andexplicative words
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 8 21
Learning
Learning step
bull Constrained run only the provided training data can be used but lexicons areallowed extract the features from the training data and run the learning algorithm
bull Unconstrained run additional training data can be included investigate aco-training approach to automatically add new examples to the training set
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 9 21
Learning
Learning step
bull Constrained run only the provided training data can be used but lexicons areallowed extract the features from the training data and run the learning algorithm
bull Unconstrained run additional training data can be included investigate aco-training approach to automatically add new examples to the training set
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 9 21
System Setup
System Setup overview
bull Learning algorithm
bull SVM with the RBF kernelbull C=4 selected after a 10-folds validation on training databull total number of features 12117
bull Completely developed in JAVA
bull Weka library is adopted for learning
bull Tweets are tokenized using ldquoTwitter NLP and Part-of-Speech Taggingrdquo
bull Word space download 10 million tweets using the Twitter Streaming API and
build the space using ldquoword2vecrdquo
bull continuous Bag-of-Words Model (CBOW)bull 200 vector dimensionsbull remove the terms with less than ten occurrences (about 200000 terms
overall)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 10 21
Evaluation
Evaluation
Evaluating systems on their ability in
bull Task A decide whether a given tweet is subjective or objective
bull Task B decide the tweet polarity with respect to four classes positive negativeneutral and mixed sentiment (both positive and negative)
Dataset
bull Training 4513 manually annotated tweets (495 tweets are not available for thedownload and are removed)
bull Test 1935 manually annotated tweets (1748 available at the time of theevaluation)
bull Metrics systems are compared against the gold standard in terms of F measure
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 11 21
Evaluation Results
Results
Setting Task F Rank Imp
baselineTask A 04005 - -Task B 03718 - -
constrainedTask A 07140 1 78Task B 06771 1 82
unconstrainedTask A 06892 2 72Task B 06638 1 79
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 12 21
Evaluation Discussion
Results Discussion
bull The system always obtains the best task performance in all settings
bull Co-training approach seems to introduce noise
bull Deep error analysis
bull the co-training system slightly improves the performance in classifyingpositive tweets
bull bias in our classifier due to the domain-specific lexicon about politicaltopics (governo Monti crisi)
bull classifier could be improved by enriching our lexicon with jargon andidiomatic expressions
bull ablation test investigate the predictive power of the features in ourmodel
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 13 21
Evaluation Ablation test
Ablation Test
Decrease of F by removing each feature group compared to the complete feature setting
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 14 21
Conclusions and Future Work
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 15 21
Conclusions and Future Work
Conclusions and Future Work
Conclusions
bull The combination of keywordmicro-blogging semantic and lexicon features resultsin best performance
bull Semantic features (distributional approach) have more predictive power
Future Work
bull Validate and generalize our findings further data and languages
bull Fix some issues in co-training approach
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 16 21
Thank you
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 17 21
For Further Reading
For Further Reading I
Barbosa L Feng J Robust sentiment detection on twitter from biased and noisydata
In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 36ndash44 Association for Computational Linguistics Stroudsburg PA USA(2010)
Basile V Bolioli A Nissim M Patti V Rosso P Overview of the Evalita2014 SENTIment POLarity Classification Task
In Proc of EVALITA 2014 Pisa Italy (2014)
Basile V Nissim M Sentiment analysis on italian tweets
In Proc of WASSA 2013 pp 100ndash107 (2013)
Davidov D Tsur O Rappoport A Enhanced sentiment learning using twitterhashtags and smileys
In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 241ndash249 Association for Computational Linguistics Stroudsburg PA USA(2010)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 18 21
For Further Reading
For Further Reading II
Esuli A Sebastiani F Sentiwordnet A publicly available lexical resource foropinion mining
In Proc of LREC pp 417ndash422 (2006)
Go A Bhayani R Huang L Twitter sentiment classification using distantsupervision
Processing pp 1ndash6 (2009)
Jansen BJ Zhang M Sobel K Chowdury A Twitter power Tweets aselectronic word of mouth
J Am Soc Inf Sci Technol 60(11) 2169ndash2188 (2009)
Kouloumpis E Wilson T Moore JD Twitter sentiment analysis The goodthe bad and the omg
In Proc of ICWSM 2011 pp 538ndash541 (2011)
Mikolov T Chen K Corrado G Dean J Efficient estimation of wordrepresentations in vector space
In Proc of ICLR Work (2013)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 19 21
For Further Reading
For Further Reading III
OrsquoConnor B Balasubramanyan R Routledge B Smith N From tweets topolls Linking text sentiment to public opinion time series
In Intl AAAI Conf on Weblogs and Social Media (ICWSM) vol 11 pp 122ndash129(2010)
Pak A Paroubek P Twitter as a corpus for sentiment analysis and opinionmining
In Proc of the Seventh Intl Conf on Language Resources and Evaluation(LRECrsquo10) (2010)
Pang B Lee L Opinion mining and sentiment analysis
Found Trends Inf Retr 2(1-2) 1ndash135 (2008)
Pianta E Bentivogli L Girardi C Multiwordnet developing an alignedmultilingual database
In Proc 1st Intl Conf on Global WordNet pp 293ndash302 (2002)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 20 21
For Further Reading
For Further Reading IV
Smolensky P Tensor product variable binding and the representation of symbolicstructures in connectionist systems
Artificial Intelligence 46(1-2) 159ndash216 (1990)
Vanzo A Croce D Basili R A context-based model for sentiment analysis intwitter
In Proc of COLING 2014 (2014)
Wilson T Wiebe J Hoffmann P Recognizing contextual polarity Anexploration of features for phrase-level sentiment analysis
Comput Linguist 35(3) 399ndash433 (2009)
Zanchetta E Baroni M Morph-it a free corpus-based morphological resourcefor the italian language
Proc of the Corpus Linguistics Conf 2005 (2005)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 21 21
- Introduction
-
- The Task
-
- The System
-
- Our Approach
- Features
-
- Learning
- System Setup
- Evaluation
-
- Results
- Discussion
- Ablation test
-
- Conclusions and Future Work
-
Learning
Learning step
bull Constrained run only the provided training data can be used but lexicons areallowed extract the features from the training data and run the learning algorithm
bull Unconstrained run additional training data can be included investigate aco-training approach to automatically add new examples to the training set
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 9 21
Learning
Learning step
bull Constrained run only the provided training data can be used but lexicons areallowed extract the features from the training data and run the learning algorithm
bull Unconstrained run additional training data can be included investigate aco-training approach to automatically add new examples to the training set
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 9 21
System Setup
System Setup overview
bull Learning algorithm
bull SVM with the RBF kernelbull C=4 selected after a 10-folds validation on training databull total number of features 12117
bull Completely developed in JAVA
bull Weka library is adopted for learning
bull Tweets are tokenized using ldquoTwitter NLP and Part-of-Speech Taggingrdquo
bull Word space download 10 million tweets using the Twitter Streaming API and
build the space using ldquoword2vecrdquo
bull continuous Bag-of-Words Model (CBOW)bull 200 vector dimensionsbull remove the terms with less than ten occurrences (about 200000 terms
overall)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 10 21
Evaluation
Evaluation
Evaluating systems on their ability in
bull Task A decide whether a given tweet is subjective or objective
bull Task B decide the tweet polarity with respect to four classes positive negativeneutral and mixed sentiment (both positive and negative)
Dataset
bull Training 4513 manually annotated tweets (495 tweets are not available for thedownload and are removed)
bull Test 1935 manually annotated tweets (1748 available at the time of theevaluation)
bull Metrics systems are compared against the gold standard in terms of F measure
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 11 21
Evaluation Results
Results
Setting Task F Rank Imp
baselineTask A 04005 - -Task B 03718 - -
constrainedTask A 07140 1 78Task B 06771 1 82
unconstrainedTask A 06892 2 72Task B 06638 1 79
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 12 21
Evaluation Discussion
Results Discussion
bull The system always obtains the best task performance in all settings
bull Co-training approach seems to introduce noise
bull Deep error analysis
bull the co-training system slightly improves the performance in classifyingpositive tweets
bull bias in our classifier due to the domain-specific lexicon about politicaltopics (governo Monti crisi)
bull classifier could be improved by enriching our lexicon with jargon andidiomatic expressions
bull ablation test investigate the predictive power of the features in ourmodel
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 13 21
Evaluation Ablation test
Ablation Test
Decrease of F by removing each feature group compared to the complete feature setting
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 14 21
Conclusions and Future Work
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 15 21
Conclusions and Future Work
Conclusions and Future Work
Conclusions
bull The combination of keywordmicro-blogging semantic and lexicon features resultsin best performance
bull Semantic features (distributional approach) have more predictive power
Future Work
bull Validate and generalize our findings further data and languages
bull Fix some issues in co-training approach
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 16 21
Thank you
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 17 21
For Further Reading
For Further Reading I
Barbosa L Feng J Robust sentiment detection on twitter from biased and noisydata
In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 36ndash44 Association for Computational Linguistics Stroudsburg PA USA(2010)
Basile V Bolioli A Nissim M Patti V Rosso P Overview of the Evalita2014 SENTIment POLarity Classification Task
In Proc of EVALITA 2014 Pisa Italy (2014)
Basile V Nissim M Sentiment analysis on italian tweets
In Proc of WASSA 2013 pp 100ndash107 (2013)
Davidov D Tsur O Rappoport A Enhanced sentiment learning using twitterhashtags and smileys
In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 241ndash249 Association for Computational Linguistics Stroudsburg PA USA(2010)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 18 21
For Further Reading
For Further Reading II
Esuli A Sebastiani F Sentiwordnet A publicly available lexical resource foropinion mining
In Proc of LREC pp 417ndash422 (2006)
Go A Bhayani R Huang L Twitter sentiment classification using distantsupervision
Processing pp 1ndash6 (2009)
Jansen BJ Zhang M Sobel K Chowdury A Twitter power Tweets aselectronic word of mouth
J Am Soc Inf Sci Technol 60(11) 2169ndash2188 (2009)
Kouloumpis E Wilson T Moore JD Twitter sentiment analysis The goodthe bad and the omg
In Proc of ICWSM 2011 pp 538ndash541 (2011)
Mikolov T Chen K Corrado G Dean J Efficient estimation of wordrepresentations in vector space
In Proc of ICLR Work (2013)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 19 21
For Further Reading
For Further Reading III
OrsquoConnor B Balasubramanyan R Routledge B Smith N From tweets topolls Linking text sentiment to public opinion time series
In Intl AAAI Conf on Weblogs and Social Media (ICWSM) vol 11 pp 122ndash129(2010)
Pak A Paroubek P Twitter as a corpus for sentiment analysis and opinionmining
In Proc of the Seventh Intl Conf on Language Resources and Evaluation(LRECrsquo10) (2010)
Pang B Lee L Opinion mining and sentiment analysis
Found Trends Inf Retr 2(1-2) 1ndash135 (2008)
Pianta E Bentivogli L Girardi C Multiwordnet developing an alignedmultilingual database
In Proc 1st Intl Conf on Global WordNet pp 293ndash302 (2002)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 20 21
For Further Reading
For Further Reading IV
Smolensky P Tensor product variable binding and the representation of symbolicstructures in connectionist systems
Artificial Intelligence 46(1-2) 159ndash216 (1990)
Vanzo A Croce D Basili R A context-based model for sentiment analysis intwitter
In Proc of COLING 2014 (2014)
Wilson T Wiebe J Hoffmann P Recognizing contextual polarity Anexploration of features for phrase-level sentiment analysis
Comput Linguist 35(3) 399ndash433 (2009)
Zanchetta E Baroni M Morph-it a free corpus-based morphological resourcefor the italian language
Proc of the Corpus Linguistics Conf 2005 (2005)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 21 21
- Introduction
-
- The Task
-
- The System
-
- Our Approach
- Features
-
- Learning
- System Setup
- Evaluation
-
- Results
- Discussion
- Ablation test
-
- Conclusions and Future Work
-
Learning
Learning step
bull Constrained run only the provided training data can be used but lexicons areallowed extract the features from the training data and run the learning algorithm
bull Unconstrained run additional training data can be included investigate aco-training approach to automatically add new examples to the training set
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 9 21
System Setup
System Setup overview
bull Learning algorithm
bull SVM with the RBF kernelbull C=4 selected after a 10-folds validation on training databull total number of features 12117
bull Completely developed in JAVA
bull Weka library is adopted for learning
bull Tweets are tokenized using ldquoTwitter NLP and Part-of-Speech Taggingrdquo
bull Word space download 10 million tweets using the Twitter Streaming API and
build the space using ldquoword2vecrdquo
bull continuous Bag-of-Words Model (CBOW)bull 200 vector dimensionsbull remove the terms with less than ten occurrences (about 200000 terms
overall)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 10 21
Evaluation
Evaluation
Evaluating systems on their ability in
bull Task A decide whether a given tweet is subjective or objective
bull Task B decide the tweet polarity with respect to four classes positive negativeneutral and mixed sentiment (both positive and negative)
Dataset
bull Training 4513 manually annotated tweets (495 tweets are not available for thedownload and are removed)
bull Test 1935 manually annotated tweets (1748 available at the time of theevaluation)
bull Metrics systems are compared against the gold standard in terms of F measure
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 11 21
Evaluation Results
Results
Setting Task F Rank Imp
baselineTask A 04005 - -Task B 03718 - -
constrainedTask A 07140 1 78Task B 06771 1 82
unconstrainedTask A 06892 2 72Task B 06638 1 79
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 12 21
Evaluation Discussion
Results Discussion
bull The system always obtains the best task performance in all settings
bull Co-training approach seems to introduce noise
bull Deep error analysis
bull the co-training system slightly improves the performance in classifyingpositive tweets
bull bias in our classifier due to the domain-specific lexicon about politicaltopics (governo Monti crisi)
bull classifier could be improved by enriching our lexicon with jargon andidiomatic expressions
bull ablation test investigate the predictive power of the features in ourmodel
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 13 21
Evaluation Ablation test
Ablation Test
Decrease of F by removing each feature group compared to the complete feature setting
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 14 21
Conclusions and Future Work
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 15 21
Conclusions and Future Work
Conclusions and Future Work
Conclusions
bull The combination of keywordmicro-blogging semantic and lexicon features resultsin best performance
bull Semantic features (distributional approach) have more predictive power
Future Work
bull Validate and generalize our findings further data and languages
bull Fix some issues in co-training approach
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 16 21
Thank you
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 17 21
For Further Reading
For Further Reading I
Barbosa L Feng J Robust sentiment detection on twitter from biased and noisydata
In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 36ndash44 Association for Computational Linguistics Stroudsburg PA USA(2010)
Basile V Bolioli A Nissim M Patti V Rosso P Overview of the Evalita2014 SENTIment POLarity Classification Task
In Proc of EVALITA 2014 Pisa Italy (2014)
Basile V Nissim M Sentiment analysis on italian tweets
In Proc of WASSA 2013 pp 100ndash107 (2013)
Davidov D Tsur O Rappoport A Enhanced sentiment learning using twitterhashtags and smileys
In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 241ndash249 Association for Computational Linguistics Stroudsburg PA USA(2010)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 18 21
For Further Reading
For Further Reading II
Esuli A Sebastiani F Sentiwordnet A publicly available lexical resource foropinion mining
In Proc of LREC pp 417ndash422 (2006)
Go A Bhayani R Huang L Twitter sentiment classification using distantsupervision
Processing pp 1ndash6 (2009)
Jansen BJ Zhang M Sobel K Chowdury A Twitter power Tweets aselectronic word of mouth
J Am Soc Inf Sci Technol 60(11) 2169ndash2188 (2009)
Kouloumpis E Wilson T Moore JD Twitter sentiment analysis The goodthe bad and the omg
In Proc of ICWSM 2011 pp 538ndash541 (2011)
Mikolov T Chen K Corrado G Dean J Efficient estimation of wordrepresentations in vector space
In Proc of ICLR Work (2013)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 19 21
For Further Reading
For Further Reading III
OrsquoConnor B Balasubramanyan R Routledge B Smith N From tweets topolls Linking text sentiment to public opinion time series
In Intl AAAI Conf on Weblogs and Social Media (ICWSM) vol 11 pp 122ndash129(2010)
Pak A Paroubek P Twitter as a corpus for sentiment analysis and opinionmining
In Proc of the Seventh Intl Conf on Language Resources and Evaluation(LRECrsquo10) (2010)
Pang B Lee L Opinion mining and sentiment analysis
Found Trends Inf Retr 2(1-2) 1ndash135 (2008)
Pianta E Bentivogli L Girardi C Multiwordnet developing an alignedmultilingual database
In Proc 1st Intl Conf on Global WordNet pp 293ndash302 (2002)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 20 21
For Further Reading
For Further Reading IV
Smolensky P Tensor product variable binding and the representation of symbolicstructures in connectionist systems
Artificial Intelligence 46(1-2) 159ndash216 (1990)
Vanzo A Croce D Basili R A context-based model for sentiment analysis intwitter
In Proc of COLING 2014 (2014)
Wilson T Wiebe J Hoffmann P Recognizing contextual polarity Anexploration of features for phrase-level sentiment analysis
Comput Linguist 35(3) 399ndash433 (2009)
Zanchetta E Baroni M Morph-it a free corpus-based morphological resourcefor the italian language
Proc of the Corpus Linguistics Conf 2005 (2005)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 21 21
- Introduction
-
- The Task
-
- The System
-
- Our Approach
- Features
-
- Learning
- System Setup
- Evaluation
-
- Results
- Discussion
- Ablation test
-
- Conclusions and Future Work
-
System Setup
System Setup overview
bull Learning algorithm
bull SVM with the RBF kernelbull C=4 selected after a 10-folds validation on training databull total number of features 12117
bull Completely developed in JAVA
bull Weka library is adopted for learning
bull Tweets are tokenized using ldquoTwitter NLP and Part-of-Speech Taggingrdquo
bull Word space download 10 million tweets using the Twitter Streaming API and
build the space using ldquoword2vecrdquo
bull continuous Bag-of-Words Model (CBOW)bull 200 vector dimensionsbull remove the terms with less than ten occurrences (about 200000 terms
overall)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 10 21
Evaluation
Evaluation
Evaluating systems on their ability in
bull Task A decide whether a given tweet is subjective or objective
bull Task B decide the tweet polarity with respect to four classes positive negativeneutral and mixed sentiment (both positive and negative)
Dataset
bull Training 4513 manually annotated tweets (495 tweets are not available for thedownload and are removed)
bull Test 1935 manually annotated tweets (1748 available at the time of theevaluation)
bull Metrics systems are compared against the gold standard in terms of F measure
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 11 21
Evaluation Results
Results
Setting Task F Rank Imp
baselineTask A 04005 - -Task B 03718 - -
constrainedTask A 07140 1 78Task B 06771 1 82
unconstrainedTask A 06892 2 72Task B 06638 1 79
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 12 21
Evaluation Discussion
Results Discussion
bull The system always obtains the best task performance in all settings
bull Co-training approach seems to introduce noise
bull Deep error analysis
bull the co-training system slightly improves the performance in classifyingpositive tweets
bull bias in our classifier due to the domain-specific lexicon about politicaltopics (governo Monti crisi)
bull classifier could be improved by enriching our lexicon with jargon andidiomatic expressions
bull ablation test investigate the predictive power of the features in ourmodel
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 13 21
Evaluation Ablation test
Ablation Test
Decrease of F by removing each feature group compared to the complete feature setting
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 14 21
Conclusions and Future Work
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 15 21
Conclusions and Future Work
Conclusions and Future Work
Conclusions
bull The combination of keywordmicro-blogging semantic and lexicon features resultsin best performance
bull Semantic features (distributional approach) have more predictive power
Future Work
bull Validate and generalize our findings further data and languages
bull Fix some issues in co-training approach
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 16 21
Thank you
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 17 21
For Further Reading
For Further Reading I
Barbosa L Feng J Robust sentiment detection on twitter from biased and noisydata
In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 36ndash44 Association for Computational Linguistics Stroudsburg PA USA(2010)
Basile V Bolioli A Nissim M Patti V Rosso P Overview of the Evalita2014 SENTIment POLarity Classification Task
In Proc of EVALITA 2014 Pisa Italy (2014)
Basile V Nissim M Sentiment analysis on italian tweets
In Proc of WASSA 2013 pp 100ndash107 (2013)
Davidov D Tsur O Rappoport A Enhanced sentiment learning using twitterhashtags and smileys
In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 241ndash249 Association for Computational Linguistics Stroudsburg PA USA(2010)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 18 21
For Further Reading
For Further Reading II
Esuli A Sebastiani F Sentiwordnet A publicly available lexical resource foropinion mining
In Proc of LREC pp 417ndash422 (2006)
Go A Bhayani R Huang L Twitter sentiment classification using distantsupervision
Processing pp 1ndash6 (2009)
Jansen BJ Zhang M Sobel K Chowdury A Twitter power Tweets aselectronic word of mouth
J Am Soc Inf Sci Technol 60(11) 2169ndash2188 (2009)
Kouloumpis E Wilson T Moore JD Twitter sentiment analysis The goodthe bad and the omg
In Proc of ICWSM 2011 pp 538ndash541 (2011)
Mikolov T Chen K Corrado G Dean J Efficient estimation of wordrepresentations in vector space
In Proc of ICLR Work (2013)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 19 21
For Further Reading
For Further Reading III
OrsquoConnor B Balasubramanyan R Routledge B Smith N From tweets topolls Linking text sentiment to public opinion time series
In Intl AAAI Conf on Weblogs and Social Media (ICWSM) vol 11 pp 122ndash129(2010)
Pak A Paroubek P Twitter as a corpus for sentiment analysis and opinionmining
In Proc of the Seventh Intl Conf on Language Resources and Evaluation(LRECrsquo10) (2010)
Pang B Lee L Opinion mining and sentiment analysis
Found Trends Inf Retr 2(1-2) 1ndash135 (2008)
Pianta E Bentivogli L Girardi C Multiwordnet developing an alignedmultilingual database
In Proc 1st Intl Conf on Global WordNet pp 293ndash302 (2002)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 20 21
For Further Reading
For Further Reading IV
Smolensky P Tensor product variable binding and the representation of symbolicstructures in connectionist systems
Artificial Intelligence 46(1-2) 159ndash216 (1990)
Vanzo A Croce D Basili R A context-based model for sentiment analysis intwitter
In Proc of COLING 2014 (2014)
Wilson T Wiebe J Hoffmann P Recognizing contextual polarity Anexploration of features for phrase-level sentiment analysis
Comput Linguist 35(3) 399ndash433 (2009)
Zanchetta E Baroni M Morph-it a free corpus-based morphological resourcefor the italian language
Proc of the Corpus Linguistics Conf 2005 (2005)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 21 21
- Introduction
-
- The Task
-
- The System
-
- Our Approach
- Features
-
- Learning
- System Setup
- Evaluation
-
- Results
- Discussion
- Ablation test
-
- Conclusions and Future Work
-
Evaluation
Evaluation
Evaluating systems on their ability in
bull Task A decide whether a given tweet is subjective or objective
bull Task B decide the tweet polarity with respect to four classes positive negativeneutral and mixed sentiment (both positive and negative)
Dataset
bull Training 4513 manually annotated tweets (495 tweets are not available for thedownload and are removed)
bull Test 1935 manually annotated tweets (1748 available at the time of theevaluation)
bull Metrics systems are compared against the gold standard in terms of F measure
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 11 21
Evaluation Results
Results
Setting Task F Rank Imp
baselineTask A 04005 - -Task B 03718 - -
constrainedTask A 07140 1 78Task B 06771 1 82
unconstrainedTask A 06892 2 72Task B 06638 1 79
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 12 21
Evaluation Discussion
Results Discussion
bull The system always obtains the best task performance in all settings
bull Co-training approach seems to introduce noise
bull Deep error analysis
bull the co-training system slightly improves the performance in classifyingpositive tweets
bull bias in our classifier due to the domain-specific lexicon about politicaltopics (governo Monti crisi)
bull classifier could be improved by enriching our lexicon with jargon andidiomatic expressions
bull ablation test investigate the predictive power of the features in ourmodel
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 13 21
Evaluation Ablation test
Ablation Test
Decrease of F by removing each feature group compared to the complete feature setting
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 14 21
Conclusions and Future Work
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 15 21
Conclusions and Future Work
Conclusions and Future Work
Conclusions
bull The combination of keywordmicro-blogging semantic and lexicon features resultsin best performance
bull Semantic features (distributional approach) have more predictive power
Future Work
bull Validate and generalize our findings further data and languages
bull Fix some issues in co-training approach
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 16 21
Thank you
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 17 21
For Further Reading
For Further Reading I
Barbosa L Feng J Robust sentiment detection on twitter from biased and noisydata
In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 36ndash44 Association for Computational Linguistics Stroudsburg PA USA(2010)
Basile V Bolioli A Nissim M Patti V Rosso P Overview of the Evalita2014 SENTIment POLarity Classification Task
In Proc of EVALITA 2014 Pisa Italy (2014)
Basile V Nissim M Sentiment analysis on italian tweets
In Proc of WASSA 2013 pp 100ndash107 (2013)
Davidov D Tsur O Rappoport A Enhanced sentiment learning using twitterhashtags and smileys
In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 241ndash249 Association for Computational Linguistics Stroudsburg PA USA(2010)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 18 21
For Further Reading
For Further Reading II
Esuli A Sebastiani F Sentiwordnet A publicly available lexical resource foropinion mining
In Proc of LREC pp 417ndash422 (2006)
Go A Bhayani R Huang L Twitter sentiment classification using distantsupervision
Processing pp 1ndash6 (2009)
Jansen BJ Zhang M Sobel K Chowdury A Twitter power Tweets aselectronic word of mouth
J Am Soc Inf Sci Technol 60(11) 2169ndash2188 (2009)
Kouloumpis E Wilson T Moore JD Twitter sentiment analysis The goodthe bad and the omg
In Proc of ICWSM 2011 pp 538ndash541 (2011)
Mikolov T Chen K Corrado G Dean J Efficient estimation of wordrepresentations in vector space
In Proc of ICLR Work (2013)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 19 21
For Further Reading
For Further Reading III
OrsquoConnor B Balasubramanyan R Routledge B Smith N From tweets topolls Linking text sentiment to public opinion time series
In Intl AAAI Conf on Weblogs and Social Media (ICWSM) vol 11 pp 122ndash129(2010)
Pak A Paroubek P Twitter as a corpus for sentiment analysis and opinionmining
In Proc of the Seventh Intl Conf on Language Resources and Evaluation(LRECrsquo10) (2010)
Pang B Lee L Opinion mining and sentiment analysis
Found Trends Inf Retr 2(1-2) 1ndash135 (2008)
Pianta E Bentivogli L Girardi C Multiwordnet developing an alignedmultilingual database
In Proc 1st Intl Conf on Global WordNet pp 293ndash302 (2002)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 20 21
For Further Reading
For Further Reading IV
Smolensky P Tensor product variable binding and the representation of symbolicstructures in connectionist systems
Artificial Intelligence 46(1-2) 159ndash216 (1990)
Vanzo A Croce D Basili R A context-based model for sentiment analysis intwitter
In Proc of COLING 2014 (2014)
Wilson T Wiebe J Hoffmann P Recognizing contextual polarity Anexploration of features for phrase-level sentiment analysis
Comput Linguist 35(3) 399ndash433 (2009)
Zanchetta E Baroni M Morph-it a free corpus-based morphological resourcefor the italian language
Proc of the Corpus Linguistics Conf 2005 (2005)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 21 21
- Introduction
-
- The Task
-
- The System
-
- Our Approach
- Features
-
- Learning
- System Setup
- Evaluation
-
- Results
- Discussion
- Ablation test
-
- Conclusions and Future Work
-
Evaluation Results
Results
Setting Task F Rank Imp
baselineTask A 04005 - -Task B 03718 - -
constrainedTask A 07140 1 78Task B 06771 1 82
unconstrainedTask A 06892 2 72Task B 06638 1 79
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 12 21
Evaluation Discussion
Results Discussion
bull The system always obtains the best task performance in all settings
bull Co-training approach seems to introduce noise
bull Deep error analysis
bull the co-training system slightly improves the performance in classifyingpositive tweets
bull bias in our classifier due to the domain-specific lexicon about politicaltopics (governo Monti crisi)
bull classifier could be improved by enriching our lexicon with jargon andidiomatic expressions
bull ablation test investigate the predictive power of the features in ourmodel
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 13 21
Evaluation Ablation test
Ablation Test
Decrease of F by removing each feature group compared to the complete feature setting
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 14 21
Conclusions and Future Work
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 15 21
Conclusions and Future Work
Conclusions and Future Work
Conclusions
bull The combination of keywordmicro-blogging semantic and lexicon features resultsin best performance
bull Semantic features (distributional approach) have more predictive power
Future Work
bull Validate and generalize our findings further data and languages
bull Fix some issues in co-training approach
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 16 21
Thank you
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 17 21
For Further Reading
For Further Reading I
Barbosa L Feng J Robust sentiment detection on twitter from biased and noisydata
In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 36ndash44 Association for Computational Linguistics Stroudsburg PA USA(2010)
Basile V Bolioli A Nissim M Patti V Rosso P Overview of the Evalita2014 SENTIment POLarity Classification Task
In Proc of EVALITA 2014 Pisa Italy (2014)
Basile V Nissim M Sentiment analysis on italian tweets
In Proc of WASSA 2013 pp 100ndash107 (2013)
Davidov D Tsur O Rappoport A Enhanced sentiment learning using twitterhashtags and smileys
In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 241ndash249 Association for Computational Linguistics Stroudsburg PA USA(2010)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 18 21
For Further Reading
For Further Reading II
Esuli A Sebastiani F Sentiwordnet A publicly available lexical resource foropinion mining
In Proc of LREC pp 417ndash422 (2006)
Go A Bhayani R Huang L Twitter sentiment classification using distantsupervision
Processing pp 1ndash6 (2009)
Jansen BJ Zhang M Sobel K Chowdury A Twitter power Tweets aselectronic word of mouth
J Am Soc Inf Sci Technol 60(11) 2169ndash2188 (2009)
Kouloumpis E Wilson T Moore JD Twitter sentiment analysis The goodthe bad and the omg
In Proc of ICWSM 2011 pp 538ndash541 (2011)
Mikolov T Chen K Corrado G Dean J Efficient estimation of wordrepresentations in vector space
In Proc of ICLR Work (2013)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 19 21
For Further Reading
For Further Reading III
OrsquoConnor B Balasubramanyan R Routledge B Smith N From tweets topolls Linking text sentiment to public opinion time series
In Intl AAAI Conf on Weblogs and Social Media (ICWSM) vol 11 pp 122ndash129(2010)
Pak A Paroubek P Twitter as a corpus for sentiment analysis and opinionmining
In Proc of the Seventh Intl Conf on Language Resources and Evaluation(LRECrsquo10) (2010)
Pang B Lee L Opinion mining and sentiment analysis
Found Trends Inf Retr 2(1-2) 1ndash135 (2008)
Pianta E Bentivogli L Girardi C Multiwordnet developing an alignedmultilingual database
In Proc 1st Intl Conf on Global WordNet pp 293ndash302 (2002)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 20 21
For Further Reading
For Further Reading IV
Smolensky P Tensor product variable binding and the representation of symbolicstructures in connectionist systems
Artificial Intelligence 46(1-2) 159ndash216 (1990)
Vanzo A Croce D Basili R A context-based model for sentiment analysis intwitter
In Proc of COLING 2014 (2014)
Wilson T Wiebe J Hoffmann P Recognizing contextual polarity Anexploration of features for phrase-level sentiment analysis
Comput Linguist 35(3) 399ndash433 (2009)
Zanchetta E Baroni M Morph-it a free corpus-based morphological resourcefor the italian language
Proc of the Corpus Linguistics Conf 2005 (2005)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 21 21
- Introduction
-
- The Task
-
- The System
-
- Our Approach
- Features
-
- Learning
- System Setup
- Evaluation
-
- Results
- Discussion
- Ablation test
-
- Conclusions and Future Work
-
Evaluation Discussion
Results Discussion
bull The system always obtains the best task performance in all settings
bull Co-training approach seems to introduce noise
bull Deep error analysis
bull the co-training system slightly improves the performance in classifyingpositive tweets
bull bias in our classifier due to the domain-specific lexicon about politicaltopics (governo Monti crisi)
bull classifier could be improved by enriching our lexicon with jargon andidiomatic expressions
bull ablation test investigate the predictive power of the features in ourmodel
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 13 21
Evaluation Ablation test
Ablation Test
Decrease of F by removing each feature group compared to the complete feature setting
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 14 21
Conclusions and Future Work
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 15 21
Conclusions and Future Work
Conclusions and Future Work
Conclusions
bull The combination of keywordmicro-blogging semantic and lexicon features resultsin best performance
bull Semantic features (distributional approach) have more predictive power
Future Work
bull Validate and generalize our findings further data and languages
bull Fix some issues in co-training approach
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 16 21
Thank you
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 17 21
For Further Reading
For Further Reading I
Barbosa L Feng J Robust sentiment detection on twitter from biased and noisydata
In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 36ndash44 Association for Computational Linguistics Stroudsburg PA USA(2010)
Basile V Bolioli A Nissim M Patti V Rosso P Overview of the Evalita2014 SENTIment POLarity Classification Task
In Proc of EVALITA 2014 Pisa Italy (2014)
Basile V Nissim M Sentiment analysis on italian tweets
In Proc of WASSA 2013 pp 100ndash107 (2013)
Davidov D Tsur O Rappoport A Enhanced sentiment learning using twitterhashtags and smileys
In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 241ndash249 Association for Computational Linguistics Stroudsburg PA USA(2010)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 18 21
For Further Reading
For Further Reading II
Esuli A Sebastiani F Sentiwordnet A publicly available lexical resource foropinion mining
In Proc of LREC pp 417ndash422 (2006)
Go A Bhayani R Huang L Twitter sentiment classification using distantsupervision
Processing pp 1ndash6 (2009)
Jansen BJ Zhang M Sobel K Chowdury A Twitter power Tweets aselectronic word of mouth
J Am Soc Inf Sci Technol 60(11) 2169ndash2188 (2009)
Kouloumpis E Wilson T Moore JD Twitter sentiment analysis The goodthe bad and the omg
In Proc of ICWSM 2011 pp 538ndash541 (2011)
Mikolov T Chen K Corrado G Dean J Efficient estimation of wordrepresentations in vector space
In Proc of ICLR Work (2013)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 19 21
For Further Reading
For Further Reading III
OrsquoConnor B Balasubramanyan R Routledge B Smith N From tweets topolls Linking text sentiment to public opinion time series
In Intl AAAI Conf on Weblogs and Social Media (ICWSM) vol 11 pp 122ndash129(2010)
Pak A Paroubek P Twitter as a corpus for sentiment analysis and opinionmining
In Proc of the Seventh Intl Conf on Language Resources and Evaluation(LRECrsquo10) (2010)
Pang B Lee L Opinion mining and sentiment analysis
Found Trends Inf Retr 2(1-2) 1ndash135 (2008)
Pianta E Bentivogli L Girardi C Multiwordnet developing an alignedmultilingual database
In Proc 1st Intl Conf on Global WordNet pp 293ndash302 (2002)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 20 21
For Further Reading
For Further Reading IV
Smolensky P Tensor product variable binding and the representation of symbolicstructures in connectionist systems
Artificial Intelligence 46(1-2) 159ndash216 (1990)
Vanzo A Croce D Basili R A context-based model for sentiment analysis intwitter
In Proc of COLING 2014 (2014)
Wilson T Wiebe J Hoffmann P Recognizing contextual polarity Anexploration of features for phrase-level sentiment analysis
Comput Linguist 35(3) 399ndash433 (2009)
Zanchetta E Baroni M Morph-it a free corpus-based morphological resourcefor the italian language
Proc of the Corpus Linguistics Conf 2005 (2005)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 21 21
- Introduction
-
- The Task
-
- The System
-
- Our Approach
- Features
-
- Learning
- System Setup
- Evaluation
-
- Results
- Discussion
- Ablation test
-
- Conclusions and Future Work
-
Evaluation Ablation test
Ablation Test
Decrease of F by removing each feature group compared to the complete feature setting
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 14 21
Conclusions and Future Work
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 15 21
Conclusions and Future Work
Conclusions and Future Work
Conclusions
bull The combination of keywordmicro-blogging semantic and lexicon features resultsin best performance
bull Semantic features (distributional approach) have more predictive power
Future Work
bull Validate and generalize our findings further data and languages
bull Fix some issues in co-training approach
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 16 21
Thank you
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 17 21
For Further Reading
For Further Reading I
Barbosa L Feng J Robust sentiment detection on twitter from biased and noisydata
In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 36ndash44 Association for Computational Linguistics Stroudsburg PA USA(2010)
Basile V Bolioli A Nissim M Patti V Rosso P Overview of the Evalita2014 SENTIment POLarity Classification Task
In Proc of EVALITA 2014 Pisa Italy (2014)
Basile V Nissim M Sentiment analysis on italian tweets
In Proc of WASSA 2013 pp 100ndash107 (2013)
Davidov D Tsur O Rappoport A Enhanced sentiment learning using twitterhashtags and smileys
In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 241ndash249 Association for Computational Linguistics Stroudsburg PA USA(2010)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 18 21
For Further Reading
For Further Reading II
Esuli A Sebastiani F Sentiwordnet A publicly available lexical resource foropinion mining
In Proc of LREC pp 417ndash422 (2006)
Go A Bhayani R Huang L Twitter sentiment classification using distantsupervision
Processing pp 1ndash6 (2009)
Jansen BJ Zhang M Sobel K Chowdury A Twitter power Tweets aselectronic word of mouth
J Am Soc Inf Sci Technol 60(11) 2169ndash2188 (2009)
Kouloumpis E Wilson T Moore JD Twitter sentiment analysis The goodthe bad and the omg
In Proc of ICWSM 2011 pp 538ndash541 (2011)
Mikolov T Chen K Corrado G Dean J Efficient estimation of wordrepresentations in vector space
In Proc of ICLR Work (2013)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 19 21
For Further Reading
For Further Reading III
OrsquoConnor B Balasubramanyan R Routledge B Smith N From tweets topolls Linking text sentiment to public opinion time series
In Intl AAAI Conf on Weblogs and Social Media (ICWSM) vol 11 pp 122ndash129(2010)
Pak A Paroubek P Twitter as a corpus for sentiment analysis and opinionmining
In Proc of the Seventh Intl Conf on Language Resources and Evaluation(LRECrsquo10) (2010)
Pang B Lee L Opinion mining and sentiment analysis
Found Trends Inf Retr 2(1-2) 1ndash135 (2008)
Pianta E Bentivogli L Girardi C Multiwordnet developing an alignedmultilingual database
In Proc 1st Intl Conf on Global WordNet pp 293ndash302 (2002)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 20 21
For Further Reading
For Further Reading IV
Smolensky P Tensor product variable binding and the representation of symbolicstructures in connectionist systems
Artificial Intelligence 46(1-2) 159ndash216 (1990)
Vanzo A Croce D Basili R A context-based model for sentiment analysis intwitter
In Proc of COLING 2014 (2014)
Wilson T Wiebe J Hoffmann P Recognizing contextual polarity Anexploration of features for phrase-level sentiment analysis
Comput Linguist 35(3) 399ndash433 (2009)
Zanchetta E Baroni M Morph-it a free corpus-based morphological resourcefor the italian language
Proc of the Corpus Linguistics Conf 2005 (2005)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 21 21
- Introduction
-
- The Task
-
- The System
-
- Our Approach
- Features
-
- Learning
- System Setup
- Evaluation
-
- Results
- Discussion
- Ablation test
-
- Conclusions and Future Work
-
Conclusions and Future Work
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 15 21
Conclusions and Future Work
Conclusions and Future Work
Conclusions
bull The combination of keywordmicro-blogging semantic and lexicon features resultsin best performance
bull Semantic features (distributional approach) have more predictive power
Future Work
bull Validate and generalize our findings further data and languages
bull Fix some issues in co-training approach
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 16 21
Thank you
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 17 21
For Further Reading
For Further Reading I
Barbosa L Feng J Robust sentiment detection on twitter from biased and noisydata
In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 36ndash44 Association for Computational Linguistics Stroudsburg PA USA(2010)
Basile V Bolioli A Nissim M Patti V Rosso P Overview of the Evalita2014 SENTIment POLarity Classification Task
In Proc of EVALITA 2014 Pisa Italy (2014)
Basile V Nissim M Sentiment analysis on italian tweets
In Proc of WASSA 2013 pp 100ndash107 (2013)
Davidov D Tsur O Rappoport A Enhanced sentiment learning using twitterhashtags and smileys
In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 241ndash249 Association for Computational Linguistics Stroudsburg PA USA(2010)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 18 21
For Further Reading
For Further Reading II
Esuli A Sebastiani F Sentiwordnet A publicly available lexical resource foropinion mining
In Proc of LREC pp 417ndash422 (2006)
Go A Bhayani R Huang L Twitter sentiment classification using distantsupervision
Processing pp 1ndash6 (2009)
Jansen BJ Zhang M Sobel K Chowdury A Twitter power Tweets aselectronic word of mouth
J Am Soc Inf Sci Technol 60(11) 2169ndash2188 (2009)
Kouloumpis E Wilson T Moore JD Twitter sentiment analysis The goodthe bad and the omg
In Proc of ICWSM 2011 pp 538ndash541 (2011)
Mikolov T Chen K Corrado G Dean J Efficient estimation of wordrepresentations in vector space
In Proc of ICLR Work (2013)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 19 21
For Further Reading
For Further Reading III
OrsquoConnor B Balasubramanyan R Routledge B Smith N From tweets topolls Linking text sentiment to public opinion time series
In Intl AAAI Conf on Weblogs and Social Media (ICWSM) vol 11 pp 122ndash129(2010)
Pak A Paroubek P Twitter as a corpus for sentiment analysis and opinionmining
In Proc of the Seventh Intl Conf on Language Resources and Evaluation(LRECrsquo10) (2010)
Pang B Lee L Opinion mining and sentiment analysis
Found Trends Inf Retr 2(1-2) 1ndash135 (2008)
Pianta E Bentivogli L Girardi C Multiwordnet developing an alignedmultilingual database
In Proc 1st Intl Conf on Global WordNet pp 293ndash302 (2002)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 20 21
For Further Reading
For Further Reading IV
Smolensky P Tensor product variable binding and the representation of symbolicstructures in connectionist systems
Artificial Intelligence 46(1-2) 159ndash216 (1990)
Vanzo A Croce D Basili R A context-based model for sentiment analysis intwitter
In Proc of COLING 2014 (2014)
Wilson T Wiebe J Hoffmann P Recognizing contextual polarity Anexploration of features for phrase-level sentiment analysis
Comput Linguist 35(3) 399ndash433 (2009)
Zanchetta E Baroni M Morph-it a free corpus-based morphological resourcefor the italian language
Proc of the Corpus Linguistics Conf 2005 (2005)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 21 21
- Introduction
-
- The Task
-
- The System
-
- Our Approach
- Features
-
- Learning
- System Setup
- Evaluation
-
- Results
- Discussion
- Ablation test
-
- Conclusions and Future Work
-
Conclusions and Future Work
Conclusions and Future Work
Conclusions
bull The combination of keywordmicro-blogging semantic and lexicon features resultsin best performance
bull Semantic features (distributional approach) have more predictive power
Future Work
bull Validate and generalize our findings further data and languages
bull Fix some issues in co-training approach
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 16 21
Thank you
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 17 21
For Further Reading
For Further Reading I
Barbosa L Feng J Robust sentiment detection on twitter from biased and noisydata
In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 36ndash44 Association for Computational Linguistics Stroudsburg PA USA(2010)
Basile V Bolioli A Nissim M Patti V Rosso P Overview of the Evalita2014 SENTIment POLarity Classification Task
In Proc of EVALITA 2014 Pisa Italy (2014)
Basile V Nissim M Sentiment analysis on italian tweets
In Proc of WASSA 2013 pp 100ndash107 (2013)
Davidov D Tsur O Rappoport A Enhanced sentiment learning using twitterhashtags and smileys
In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 241ndash249 Association for Computational Linguistics Stroudsburg PA USA(2010)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 18 21
For Further Reading
For Further Reading II
Esuli A Sebastiani F Sentiwordnet A publicly available lexical resource foropinion mining
In Proc of LREC pp 417ndash422 (2006)
Go A Bhayani R Huang L Twitter sentiment classification using distantsupervision
Processing pp 1ndash6 (2009)
Jansen BJ Zhang M Sobel K Chowdury A Twitter power Tweets aselectronic word of mouth
J Am Soc Inf Sci Technol 60(11) 2169ndash2188 (2009)
Kouloumpis E Wilson T Moore JD Twitter sentiment analysis The goodthe bad and the omg
In Proc of ICWSM 2011 pp 538ndash541 (2011)
Mikolov T Chen K Corrado G Dean J Efficient estimation of wordrepresentations in vector space
In Proc of ICLR Work (2013)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 19 21
For Further Reading
For Further Reading III
OrsquoConnor B Balasubramanyan R Routledge B Smith N From tweets topolls Linking text sentiment to public opinion time series
In Intl AAAI Conf on Weblogs and Social Media (ICWSM) vol 11 pp 122ndash129(2010)
Pak A Paroubek P Twitter as a corpus for sentiment analysis and opinionmining
In Proc of the Seventh Intl Conf on Language Resources and Evaluation(LRECrsquo10) (2010)
Pang B Lee L Opinion mining and sentiment analysis
Found Trends Inf Retr 2(1-2) 1ndash135 (2008)
Pianta E Bentivogli L Girardi C Multiwordnet developing an alignedmultilingual database
In Proc 1st Intl Conf on Global WordNet pp 293ndash302 (2002)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 20 21
For Further Reading
For Further Reading IV
Smolensky P Tensor product variable binding and the representation of symbolicstructures in connectionist systems
Artificial Intelligence 46(1-2) 159ndash216 (1990)
Vanzo A Croce D Basili R A context-based model for sentiment analysis intwitter
In Proc of COLING 2014 (2014)
Wilson T Wiebe J Hoffmann P Recognizing contextual polarity Anexploration of features for phrase-level sentiment analysis
Comput Linguist 35(3) 399ndash433 (2009)
Zanchetta E Baroni M Morph-it a free corpus-based morphological resourcefor the italian language
Proc of the Corpus Linguistics Conf 2005 (2005)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 21 21
- Introduction
-
- The Task
-
- The System
-
- Our Approach
- Features
-
- Learning
- System Setup
- Evaluation
-
- Results
- Discussion
- Ablation test
-
- Conclusions and Future Work
-
Thank you
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 17 21
For Further Reading
For Further Reading I
Barbosa L Feng J Robust sentiment detection on twitter from biased and noisydata
In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 36ndash44 Association for Computational Linguistics Stroudsburg PA USA(2010)
Basile V Bolioli A Nissim M Patti V Rosso P Overview of the Evalita2014 SENTIment POLarity Classification Task
In Proc of EVALITA 2014 Pisa Italy (2014)
Basile V Nissim M Sentiment analysis on italian tweets
In Proc of WASSA 2013 pp 100ndash107 (2013)
Davidov D Tsur O Rappoport A Enhanced sentiment learning using twitterhashtags and smileys
In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 241ndash249 Association for Computational Linguistics Stroudsburg PA USA(2010)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 18 21
For Further Reading
For Further Reading II
Esuli A Sebastiani F Sentiwordnet A publicly available lexical resource foropinion mining
In Proc of LREC pp 417ndash422 (2006)
Go A Bhayani R Huang L Twitter sentiment classification using distantsupervision
Processing pp 1ndash6 (2009)
Jansen BJ Zhang M Sobel K Chowdury A Twitter power Tweets aselectronic word of mouth
J Am Soc Inf Sci Technol 60(11) 2169ndash2188 (2009)
Kouloumpis E Wilson T Moore JD Twitter sentiment analysis The goodthe bad and the omg
In Proc of ICWSM 2011 pp 538ndash541 (2011)
Mikolov T Chen K Corrado G Dean J Efficient estimation of wordrepresentations in vector space
In Proc of ICLR Work (2013)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 19 21
For Further Reading
For Further Reading III
OrsquoConnor B Balasubramanyan R Routledge B Smith N From tweets topolls Linking text sentiment to public opinion time series
In Intl AAAI Conf on Weblogs and Social Media (ICWSM) vol 11 pp 122ndash129(2010)
Pak A Paroubek P Twitter as a corpus for sentiment analysis and opinionmining
In Proc of the Seventh Intl Conf on Language Resources and Evaluation(LRECrsquo10) (2010)
Pang B Lee L Opinion mining and sentiment analysis
Found Trends Inf Retr 2(1-2) 1ndash135 (2008)
Pianta E Bentivogli L Girardi C Multiwordnet developing an alignedmultilingual database
In Proc 1st Intl Conf on Global WordNet pp 293ndash302 (2002)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 20 21
For Further Reading
For Further Reading IV
Smolensky P Tensor product variable binding and the representation of symbolicstructures in connectionist systems
Artificial Intelligence 46(1-2) 159ndash216 (1990)
Vanzo A Croce D Basili R A context-based model for sentiment analysis intwitter
In Proc of COLING 2014 (2014)
Wilson T Wiebe J Hoffmann P Recognizing contextual polarity Anexploration of features for phrase-level sentiment analysis
Comput Linguist 35(3) 399ndash433 (2009)
Zanchetta E Baroni M Morph-it a free corpus-based morphological resourcefor the italian language
Proc of the Corpus Linguistics Conf 2005 (2005)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 21 21
- Introduction
-
- The Task
-
- The System
-
- Our Approach
- Features
-
- Learning
- System Setup
- Evaluation
-
- Results
- Discussion
- Ablation test
-
- Conclusions and Future Work
-
For Further Reading
For Further Reading I
Barbosa L Feng J Robust sentiment detection on twitter from biased and noisydata
In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 36ndash44 Association for Computational Linguistics Stroudsburg PA USA(2010)
Basile V Bolioli A Nissim M Patti V Rosso P Overview of the Evalita2014 SENTIment POLarity Classification Task
In Proc of EVALITA 2014 Pisa Italy (2014)
Basile V Nissim M Sentiment analysis on italian tweets
In Proc of WASSA 2013 pp 100ndash107 (2013)
Davidov D Tsur O Rappoport A Enhanced sentiment learning using twitterhashtags and smileys
In Proc of the 23rd Intl Conf on Computational Linguistics Posters COLINGrsquo10 pp 241ndash249 Association for Computational Linguistics Stroudsburg PA USA(2010)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 18 21
For Further Reading
For Further Reading II
Esuli A Sebastiani F Sentiwordnet A publicly available lexical resource foropinion mining
In Proc of LREC pp 417ndash422 (2006)
Go A Bhayani R Huang L Twitter sentiment classification using distantsupervision
Processing pp 1ndash6 (2009)
Jansen BJ Zhang M Sobel K Chowdury A Twitter power Tweets aselectronic word of mouth
J Am Soc Inf Sci Technol 60(11) 2169ndash2188 (2009)
Kouloumpis E Wilson T Moore JD Twitter sentiment analysis The goodthe bad and the omg
In Proc of ICWSM 2011 pp 538ndash541 (2011)
Mikolov T Chen K Corrado G Dean J Efficient estimation of wordrepresentations in vector space
In Proc of ICLR Work (2013)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 19 21
For Further Reading
For Further Reading III
OrsquoConnor B Balasubramanyan R Routledge B Smith N From tweets topolls Linking text sentiment to public opinion time series
In Intl AAAI Conf on Weblogs and Social Media (ICWSM) vol 11 pp 122ndash129(2010)
Pak A Paroubek P Twitter as a corpus for sentiment analysis and opinionmining
In Proc of the Seventh Intl Conf on Language Resources and Evaluation(LRECrsquo10) (2010)
Pang B Lee L Opinion mining and sentiment analysis
Found Trends Inf Retr 2(1-2) 1ndash135 (2008)
Pianta E Bentivogli L Girardi C Multiwordnet developing an alignedmultilingual database
In Proc 1st Intl Conf on Global WordNet pp 293ndash302 (2002)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 20 21
For Further Reading
For Further Reading IV
Smolensky P Tensor product variable binding and the representation of symbolicstructures in connectionist systems
Artificial Intelligence 46(1-2) 159ndash216 (1990)
Vanzo A Croce D Basili R A context-based model for sentiment analysis intwitter
In Proc of COLING 2014 (2014)
Wilson T Wiebe J Hoffmann P Recognizing contextual polarity Anexploration of features for phrase-level sentiment analysis
Comput Linguist 35(3) 399ndash433 (2009)
Zanchetta E Baroni M Morph-it a free corpus-based morphological resourcefor the italian language
Proc of the Corpus Linguistics Conf 2005 (2005)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 21 21
- Introduction
-
- The Task
-
- The System
-
- Our Approach
- Features
-
- Learning
- System Setup
- Evaluation
-
- Results
- Discussion
- Ablation test
-
- Conclusions and Future Work
-
For Further Reading
For Further Reading II
Esuli A Sebastiani F Sentiwordnet A publicly available lexical resource foropinion mining
In Proc of LREC pp 417ndash422 (2006)
Go A Bhayani R Huang L Twitter sentiment classification using distantsupervision
Processing pp 1ndash6 (2009)
Jansen BJ Zhang M Sobel K Chowdury A Twitter power Tweets aselectronic word of mouth
J Am Soc Inf Sci Technol 60(11) 2169ndash2188 (2009)
Kouloumpis E Wilson T Moore JD Twitter sentiment analysis The goodthe bad and the omg
In Proc of ICWSM 2011 pp 538ndash541 (2011)
Mikolov T Chen K Corrado G Dean J Efficient estimation of wordrepresentations in vector space
In Proc of ICLR Work (2013)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 19 21
For Further Reading
For Further Reading III
OrsquoConnor B Balasubramanyan R Routledge B Smith N From tweets topolls Linking text sentiment to public opinion time series
In Intl AAAI Conf on Weblogs and Social Media (ICWSM) vol 11 pp 122ndash129(2010)
Pak A Paroubek P Twitter as a corpus for sentiment analysis and opinionmining
In Proc of the Seventh Intl Conf on Language Resources and Evaluation(LRECrsquo10) (2010)
Pang B Lee L Opinion mining and sentiment analysis
Found Trends Inf Retr 2(1-2) 1ndash135 (2008)
Pianta E Bentivogli L Girardi C Multiwordnet developing an alignedmultilingual database
In Proc 1st Intl Conf on Global WordNet pp 293ndash302 (2002)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 20 21
For Further Reading
For Further Reading IV
Smolensky P Tensor product variable binding and the representation of symbolicstructures in connectionist systems
Artificial Intelligence 46(1-2) 159ndash216 (1990)
Vanzo A Croce D Basili R A context-based model for sentiment analysis intwitter
In Proc of COLING 2014 (2014)
Wilson T Wiebe J Hoffmann P Recognizing contextual polarity Anexploration of features for phrase-level sentiment analysis
Comput Linguist 35(3) 399ndash433 (2009)
Zanchetta E Baroni M Morph-it a free corpus-based morphological resourcefor the italian language
Proc of the Corpus Linguistics Conf 2005 (2005)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 21 21
- Introduction
-
- The Task
-
- The System
-
- Our Approach
- Features
-
- Learning
- System Setup
- Evaluation
-
- Results
- Discussion
- Ablation test
-
- Conclusions and Future Work
-
For Further Reading
For Further Reading III
OrsquoConnor B Balasubramanyan R Routledge B Smith N From tweets topolls Linking text sentiment to public opinion time series
In Intl AAAI Conf on Weblogs and Social Media (ICWSM) vol 11 pp 122ndash129(2010)
Pak A Paroubek P Twitter as a corpus for sentiment analysis and opinionmining
In Proc of the Seventh Intl Conf on Language Resources and Evaluation(LRECrsquo10) (2010)
Pang B Lee L Opinion mining and sentiment analysis
Found Trends Inf Retr 2(1-2) 1ndash135 (2008)
Pianta E Bentivogli L Girardi C Multiwordnet developing an alignedmultilingual database
In Proc 1st Intl Conf on Global WordNet pp 293ndash302 (2002)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 20 21
For Further Reading
For Further Reading IV
Smolensky P Tensor product variable binding and the representation of symbolicstructures in connectionist systems
Artificial Intelligence 46(1-2) 159ndash216 (1990)
Vanzo A Croce D Basili R A context-based model for sentiment analysis intwitter
In Proc of COLING 2014 (2014)
Wilson T Wiebe J Hoffmann P Recognizing contextual polarity Anexploration of features for phrase-level sentiment analysis
Comput Linguist 35(3) 399ndash433 (2009)
Zanchetta E Baroni M Morph-it a free corpus-based morphological resourcefor the italian language
Proc of the Corpus Linguistics Conf 2005 (2005)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 21 21
- Introduction
-
- The Task
-
- The System
-
- Our Approach
- Features
-
- Learning
- System Setup
- Evaluation
-
- Results
- Discussion
- Ablation test
-
- Conclusions and Future Work
-
For Further Reading
For Further Reading IV
Smolensky P Tensor product variable binding and the representation of symbolicstructures in connectionist systems
Artificial Intelligence 46(1-2) 159ndash216 (1990)
Vanzo A Croce D Basili R A context-based model for sentiment analysis intwitter
In Proc of COLING 2014 (2014)
Wilson T Wiebe J Hoffmann P Recognizing contextual polarity Anexploration of features for phrase-level sentiment analysis
Comput Linguist 35(3) 399ndash433 (2009)
Zanchetta E Baroni M Morph-it a free corpus-based morphological resourcefor the italian language
Proc of the Corpus Linguistics Conf 2005 (2005)
Basile and Novielli (UNIBA) UNIBA-SENTIPOLC EVALITA 2014 11 Dec rsquo14 21 21
- Introduction
-
- The Task
-
- The System
-
- Our Approach
- Features
-
- Learning
- System Setup
- Evaluation
-
- Results
- Discussion
- Ablation test
-
- Conclusions and Future Work
-