CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining...
Transcript of CS 6120/CS4120: Natural Language Processing · 2017-11-20 · •Opinion mining •Sentiment mining...
CS6120/CS4120:NaturalLanguageProcessing
Instructor:Prof.LuWangCollegeofComputerandInformationScience
NortheasternUniversityWebpage:www.ccs.neu.edu/home/luwang
Presentationandreport• ProblemDescription(10point)Whatisthetask?SysteminputandoutputExampleswillbehelpful
• Reference/Relatedwork(20points)Putyourworkincontext:whathasbeendonebefore?Youneedtohavereference!What’snewinyourwork?
• Methodology:Whatyouhavedone(30points)PreprocessingofthedataWhatareyourdata?Featuresused?Whatareeffective,andwhatarenot?Whatmethodsdoyouexperimentwith?Andwhydoyouthinkthey’rereasonableandsuitableforthetask?
• Experiments(40points)Datasetssize,train/test/developmentEvaluationmetrics:whatareusedandaretheypropertocalibratesystemperformance?Baselines:whatarethey?Results,tables,figures,etc
SentimentAnalysis
Positiveornegativemoviereview?
• unbelievablydisappointing• Fullofzanycharactersandrichlyappliedsatire,andsomegreatplottwists
• thisisthegreatestscrewballcomedyeverfilmed• Itwaspathetic.Theworstpartaboutitwastheboxingscenes.
GoogleProductSearch
• a
BingShopping
• a
TwittersentimentversusGallupPollofConsumerConfidence
BrendanO'Connor,Ramnath Balasubramanyan,BryanR.Routledge,andNoahA.Smith.2010.FromTweetstoPolls:LinkingTextSentimenttoPublicOpinionTimeSeries.InICWSM-2010
Twittersentiment:
JohanBollen,Huina Mao,Xiaojun Zeng.2011.Twittermoodpredictsthestockmarket,JournalofComputationalScience2:1,1-8.10.1016/j.jocs.2010.12.007.
TargetSentimentonTwitter
• TwitterSentimentApp• AlecGo,Richa Bhayani,LeiHuang.2009.TwitterSentimentClassificationusingDistantSupervision
Sentimentanalysishasmanyothernames
•Opinionextraction•Opinionmining•Sentimentmining•Subjectivityanalysis
Whysentimentanalysis?
•Movie:isthisreviewpositiveornegative?•Products:whatdopeoplethinkaboutthenewiPhone?•Publicsentiment:howisconsumerconfidence?Isdespairincreasing?
•Politics:whatdopeoplethinkaboutthiscandidateorissue?•Prediction:predictelectionoutcomesormarkettrendsfromsentiment
SchererTypologyofAffectiveStates
• Emotion:brieforganicallysynchronized…evaluationofamajorevent• angry,sad,joyful,fearful,ashamed,proud,elated
• Mood:diffusenon-causedlow-intensitylong-durationchangeinsubjectivefeeling• cheerful,gloomy,irritable,listless,depressed,buoyant
• Interpersonalstances:affectivestancetowardanotherpersoninaspecificinteraction• friendly,flirtatious,distant,cold,warm,supportive,contemptuous
• Attitudes:enduring,affectivelycoloredbeliefs,dispositionstowardsobjectsorpersons• liking,loving,hating,valuing,desiring
• Personalitytraits:stablepersonalitydispositionsandtypicalbehaviortendencies• nervous,anxious,reckless,morose,hostile,jealous
SchererTypologyofAffectiveStates
• Emotion:brieforganicallysynchronized…evaluationofamajorevent• angry,sad,joyful,fearful,ashamed,proud,elated
• Mood:diffusenon-causedlow-intensitylong-durationchangeinsubjectivefeeling• cheerful,gloomy,irritable,listless,depressed,buoyant
• Interpersonalstances:affectivestancetowardanotherpersoninaspecificinteraction• friendly,flirtatious,distant,cold,warm,supportive,contemptuous
• Attitudes:enduring,affectivelycoloredbeliefs,dispositionstowardsobjectsorpersons• liking,loving,hating,valuing,desiring
• Personalitytraits:stablepersonalitydispositionsandtypicalbehaviortendencies• nervous,anxious,reckless,morose,hostile,jealous
SentimentAnalysis
• Sentimentanalysisisthedetectionofattitudes“enduring,affectivelycoloredbeliefs,dispositionstowardsobjectsorpersons”1. Holder(source)ofattitude2. Target(aspect)ofattitude3. Typeofattitude
• Fromasetoftypes• Like,love,hate,value,desire, etc.
• Or(morecommonly)simpleweightedpolarity:• positive,negative,neutral,togetherwithstrength
4. Text containingtheattitude• Sentence orentiredocument
SentimentAnalysis
•Simplesttask:• Istheattitudeofthistextpositiveornegative?
•Morecomplex:•Ranktheattitudeofthistextfrom1to5
•Advanced:•Detectthetarget,source,orcomplexattitudetypes
SentimentAnalysis
•Simplesttask:• Istheattitudeofthistextpositiveornegative?
•Morecomplex:•Ranktheattitudeofthistextfrom1to5
•Advanced:•Detectthetarget,source,orcomplexattitudetypes
Sentiment Classification in Movie Reviews
• Polaritydetection:• IsanIMDBmoviereviewpositiveornegative?
• Data:PolarityData2.0:• http://www.cs.cornell.edu/people/pabo/movie-review-data
BoPang,LillianLee,andShivakumar Vaithyanathan.2002.Thumbsup?SentimentClassificationusingMachineLearningTechniques.EMNLP-2002,79—86.BoPangandLillianLee.2004.ASentimentalEducation:SentimentAnalysisUsingSubjectivitySummarizationBasedonMinimumCuts.ACL,271-278
IMDBdatainthePangandLeedatabase
when_starwars_cameoutsometwentyyearsago,theimageoftravelingthroughoutthestarshasbecomeacommonplaceimage.[…]whenhan sologoeslightspeed,thestarschangetobrightlines,goingtowardstheviewerinlinesthatconvergeataninvisiblepoint.cool._october sky_offersamuchsimplerimage–thatofasinglewhitedot,travelinghorizontallyacrossthenightsky.[...]
“snakeeyes”isthemostaggravatingkindofmovie:thekindthatshowssomuchpotentialthenbecomesunbelievablydisappointing.it’snotjustbecausethisisabriandepalma film,andsincehe’sagreatdirectorandonewho’sfilmsarealwaysgreetedwithatleastsomefanfare.andit’snotevenbecausethiswasafilmstarringnicolas cageandsincehegivesabrauvara performance,thisfilmishardlyworthhistalents.
✓ ✗
BaselineAlgorithm(adaptedfromPangandLee)•Tokenization•FeatureExtraction•Classificationusingdifferentclassifiers
• NaïveBayes• MaxEnt• SVM
SentimentTokenizationIssues
• DealwithHTMLandXMLmarkup• Twittermark-up(names,hashtags)• Capitalization(preserveforwordsinallcaps)
• Phonenumbers,dates• Emoticons• Usefulcode:
• ChristopherPottssentimenttokenizer• BrendanO’Connortwittertokenizer
[<>]? # optional hat/brow[:;=8] # eyes[\-o\*\']? # optional nose[\)\]\(\[dDpP/\:\}\{@\|\\] # mouth | #### reverse orientation[\)\]\(\[dDpP/\:\}\{@\|\\] # mouth[\-o\*\']? # optional nose[:;=8] # eyes[<>]? # optional hat/brow
Pottsemoticons
ExtractingFeaturesforSentimentClassification
• Howtohandlenegation• I didn’t like this movie
vs• I really like this movie
• Whichwordstouse?• Onlyadjectives• Allwords
• Allwordsturnsouttoworkbetter,atleastonthisdata
Negation
AddNOT_toeverywordbetweennegationandfollowingpunctuation:
didn’t like this movie , but I
didn’t NOT_like NOT_this NOT_movie but I
Das,Sanjiv andMikeChen.2001.Yahoo!forAmazon:Extractingmarketsentimentfromstockmessageboards.InProceedingsoftheAsiaPacificFinanceAssociationAnnualConference(APFA).BoPang,LillianLee,andShivakumar Vaithyanathan.2002.Thumbsup?SentimentClassificationusingMachineLearningTechniques.EMNLP-2002,79—86.
Reminder:Naïve Bayes
P̂(w | c) = count(w,c)+1count(c)+ V
cNB = argmaxc j∈C
P(cj ) P(wi | cj )i∈positions∏
Binarized (Booleanfeature)MultinomialNaïve Bayes
• Intuition:• Forsentiment(andprobablyforothertextclassificationdomains)• Wordoccurrencemaymattermorethanwordfrequency
• Theoccurrenceofthewordfantastic tellsusalot• Thefactthatitoccurs5timesmaynottellusmuchmore.
• BooleanMultinomialNaïve Bayes• Clipsallthewordcountsineachdocumentat1
BooleanMultinomialNaïveBayes:Learning
• CalculateP(cj) terms• Foreachcj inC do
docsj¬ alldocswithclass=cj
P(cj )←| docsj |
| total # documents|
P(wk | cj )←nk +α
n+α |Vocabulary |
• Textj¬ singledoccontainingalldocsj• For eachwordwk inVocabulary
nk¬ #ofoccurrencesofwk inTextj
• Fromtrainingcorpus,extractVocabulary• CalculateP(wk | cj) terms
• Removeduplicatesineachdoc:• Foreachwordtypewindocj• Retainonlyasingleinstanceofw
BooleanMultinomialNaïve Bayesonatestdocumentd
• Firstremoveallduplicatewordsfromd• ThencomputeNBusingthesameequation:
cNB = argmaxc j∈C
P(cj ) P(wi | cj )i∈positions∏
Normalvs.BooleanMultinomialNBNormal Doc Words ClassTraining 1 Chinese BeijingChinese c
2 ChineseChineseShanghai c3 ChineseMacao c4 TokyoJapanChinese j
Test 5 ChineseChineseChineseTokyo Japan ?
Boolean Doc Words ClassTraining 1 Chinese Beijing c
2 ChineseShanghai c3 ChineseMacao c4 TokyoJapanChinese j
Test 5 ChineseTokyo Japan ?
Binarized (Booleanfeature)MultinomialNaïve Bayes
•Binaryseemstoworkbetterthanfullwordcounts•Otherpossibility:log(freq(w))
B.Pang,L.Lee,andS.Vaithyanathan.2002.Thumbsup?SentimentClassificationusingMachineLearningTechniques.EMNLP-2002,79—86.V.Metsis,I.Androutsopoulos,G.Paliouras.2006.SpamFilteringwithNaiveBayes– WhichNaiveBayes?CEAS2006- ThirdConferenceonEmailandAnti-Spam.K.-M.Schneider.2004.OnwordfrequencyinformationandnegativeevidenceinNaiveBayestextclassification.ICANLP,474-485.JDRennie,LShih,JTeevan.2003.Tacklingthepoorassumptionsofnaivebayes textclassifiers.ICML2003
Cross-Validation
• Breakupdatainto5 folds• (Equalpositiveandnegativeinsideeachfold?)
• Foreachfold• Choosethefoldasatemporarytestset
• Trainon4folds,computeperformanceonthetestfold
• Reportaverageperformanceofthe4 runs
TrainingTest
Test
Test
Test
Test
Training
Training Training
Training
Training
Iteration
1
2
3
4
5
OtherissuesinClassification
• MaxEnt andSVMtendtodobetterthanNaïve Bayes
Problems:Whatmakesreviewshardtoclassify?
•Subtlety:• PerfumereviewinPerfumes:theGuide:
• “Ifyouarereadingthisbecauseitisyourdarlingfragrance,pleasewearitathomeexclusively,andtapethewindowsshut.”
• DorothyParkeronKatherineHepburn• “SherunsthegamutofemotionsfromAtoB”
ThwartedExpectationsandOrderingEffects
• “Thisfilmshouldbebrilliant.Itsoundslikeagreatplot,theactorsarefirstgrade,andthesupportingcastisgoodaswell,andStalloneisattemptingtodeliveragoodperformance.However,itcan’tholdup.”
•WellasusualKeanuReevesisnothingspecial,butsurprisingly,theverytalentedLaurenceFishbourne isnotsogoodeither,Iwassurprised.
SentimentLexicons
TheGeneralInquirer
• Homepage:http://www.wjh.harvard.edu/~inquirer• ListofCategories:http://www.wjh.harvard.edu/~inquirer/homecat.htm
• Spreadsheet:http://www.wjh.harvard.edu/~inquirer/inquirerbasic.xls• Categories:
• Positiv (1915words)andNegativ (2291words)• Strongvs Weak,Activevs Passive,OverstatedversusUnderstated• Pleasure,Pain,Virtue,Vice,Motivation,CognitiveOrientation,etc
• FreeforResearchUse
PhilipJ.Stone,DexterCDunphy,MarshallS.Smith,DanielM.Ogilvie.1966.TheGeneralInquirer:AComputerApproachtoContentAnalysis.MITPress
LIWC(LinguisticInquiryandWordCount)Pennebaker,J.W.,Booth,R.J.,&Francis,M.E.(2007).LinguisticInquiryandWordCount:LIWC2007.Austin,TX
• Homepage:http://www.liwc.net/• 2300words,>70classes• AffectiveProcesses
• negativeemotion(bad,weird,hate,problem,tough)• positiveemotion(love,nice,sweet)
• CognitiveProcesses• Tentative(maybe,perhaps,guess),Inhibition(block,constraint)
• Pronouns,Negation(no,never),Quantifiers(few,many)• Notfreethough!
MPQASubjectivityCuesLexicon
• Homepage:http://www.cs.pitt.edu/mpqa/subj_lexicon.html• 6885wordsfrom8221lemmas
• 2718positive• 4912negative
• Eachwordannotatedforintensity(strong,weak)• GNUGPL
Theresa Wilson,Janyce Wiebe,andPaulHoffmann(2005).Recognizing Contextual Polarity inPhrase-LevelSentiment Analysis.Proc.ofHLT-EMNLP-2005.
Riloff andWiebe (2003).Learningextractionpatternsforsubjectiveexpressions.EMNLP-2003.
BingLiuOpinionLexicon
• BingLiu'sPageonOpinionMining• http://www.cs.uic.edu/~liub/FBS/opinion-lexicon-English.rar
•6786words• 2006positive• 4783negative
Minqing HuandBingLiu.MiningandSummarizingCustomerReviews.ACMSIGKDD-2004.
SentiWordNetStefanoBaccianella,AndreaEsuli,andFabrizioSebastiani.2010SENTIWORDNET3.0:AnEnhanced Lexical ResourceforSentiment AnalysisandOpinionMining.LREC-2010
• Homepage:http://sentiwordnet.isti.cnr.it/• AllWordNet synsets automaticallyannotatedfordegreesofpositivity,
negativity,andneutrality/objectiveness• [estimable(J,3)]“maybecomputedorestimated”
Pos 0 Neg 0 Obj 1 • [estimable(J,1)]“deservingofrespectorhighregard”
Pos .75 Neg 0 Obj .25
Disagreementsbetweenpolaritylexicons
OpinionLexicon
GeneralInquirer
SentiWordNet LIWC
MPQA 33/5402 (0.6%) 49/2867(2%) 1127/4214(27%) 12/363(3%)
OpinionLexicon 32/2411 (1%) 1004/3994 (25%) 9/403(2%)
GeneralInquirer 520/2306(23%) 1/204 (0.5%)
SentiWordNet 174/694(25%)
LIWC
ChristopherPotts,SentimentTutorial,2011
AnalyzingthepolarityofeachwordinIMDB
• Howlikelyiseachwordtoappearineachsentimentclass?• Count(“bad”)in1-star,2-star,3-star,etc.• Butcan’tuserawcounts:• Instead,likelihood:
• Makethemcomparablebetweenwords• Scaledlikelihood:
Potts,Christopher.2011.Onthenegativityofnegation.SALT20,636-659.
P(w | c) = f (w,c)f (w,c)
w∈c∑
P(w | c)P(w)
AnalyzingthepolarityofeachwordinIMDB
●●
●●
●●
●
●
●●
POS good (883,417 tokens)
1 2 3 4 5 6 7 8 9 10
0.080.10.12
● ● ● ● ●●
●
●
●
●
amazing (103,509 tokens)
1 2 3 4 5 6 7 8 9 10
0.05
0.17
0.28
●●
●●
●
●
●
●
●
●
great (648,110 tokens)
1 2 3 4 5 6 7 8 9 10
0.05
0.11
0.17
● ● ● ●●
●
●
●
●
●
awesome (47,142 tokens)
1 2 3 4 5 6 7 8 9 10
0.05
0.16
0.27
Pr(c|w)
Rating
● ● ● ●
●
●
●
●● ●
NEG good (20,447 tokens)
1 2 3 4 5 6 7 8 9 10
0.03
0.1
0.16● ●
●●
●
●● ● ●
●
depress(ed/ing) (18,498 tokens)
1 2 3 4 5 6 7 8 9 10
0.080.110.13
●
●
●
●
●
●
●
●● ●
bad (368,273 tokens)
1 2 3 4 5 6 7 8 9 10
0.04
0.12
0.21
●
●
●
●
●
●
●● ● ●
terrible (55,492 tokens)
1 2 3 4 5 6 7 8 9 10
0.03
0.16
0.28
Pr(c|w)
Rating
Scaledlikelihoo
dP(w|c)/P(w)
Scaledlikelihoo
dP(w|c)/P(w)
Potts,Christopher.2011.Onthenegativityofnegation.SALT20,636-659.
Othersentimentfeature:Logicalnegation
• Islogicalnegation(no,not)associatedwithnegativesentiment?
•Pottsexperiment:• Countnegation(not,n’t,no,never)inonlinereviews• Regressagainstthereviewrating
Potts,Christopher.2011.Onthenegativityofnegation.SALT20,636-659.
Potts2011Results:Morenegationinnegativesentiment
a
Scaledlikelihoo
dP(w|c)/P(w)
LearningSentimentLexicons
Semi-supervisedlearningoflexicons
•Useasmallamountofinformation• Afewlabeledexamples• Afewhand-builtpatterns
•Tobootstrapalexicon
Hatzivassiloglou andMcKeown intuitionforidentifyingwordpolarity
•Adjectivesconjoinedby“and”havesamepolarity• Fairand legitimate,corruptand brutal• *fairand brutal,*corruptand legitimate
•Adjectivesconjoinedby“but”donot• fairbutbrutal
Vasileios Hatzivassiloglou andKathleenR.McKeown.1997.PredictingtheSemanticOrientationofAdjectives.ACL,174–181
Hatzivassiloglou &McKeown 1997Step1• Labelseedsetof1336adjectives(all>20in21millionwordWSJcorpus)
• 657positive• adequatecentralcleverfamousintelligentremarkablereputedsensitiveslenderthriving…
• 679negative• contagiousdrunkenignorantlankylistlessprimitivestridenttroublesomeunresolvedunsuspecting…
Hatzivassiloglou &McKeown 1997Step2
•Expandseedsettoconjoinedadjectives
nice, helpful
nice, classy
Hatzivassiloglou &McKeown 1997Step3• Supervisedclassifierassigns“polaritysimilarity”toeachwordpair,resultingingraph:
classy
nice
helpful
fair
brutal
irrationalcorrupt
Hatzivassiloglou &McKeown 1997Step4• Clusteringforpartitioningthegraphintotwo
classy
nice
helpful
fair
brutal
irrationalcorrupt
+ -
Outputpolaritylexicon
• Positive• bolddecisivedisturbinggenerousgoodhonestimportantlargematurepatientpeacefulpositiveproudsoundstimulatingstraightforwardstrangetalentedvigorouswitty…
• Negative• ambiguouscautiouscynicalevasiveharmfulhypocriticalinefficientinsecureirrationalirresponsibleminoroutspokenpleasantrecklessriskyselfishtediousunsupportedvulnerablewasteful…
Outputpolaritylexicon
• Positive• bolddecisivedisturbing generousgoodhonestimportantlargematurepatientpeacefulpositiveproudsoundstimulatingstraightforwardstrangetalentedvigorouswitty…
• Negative• ambiguouscautious cynicalevasiveharmfulhypocriticalinefficientinsecureirrationalirresponsibleminoroutspoken pleasant recklessriskyselfishtediousunsupportedvulnerablewasteful…
Turney Algorithm
1. Extractaphrasallexiconfromreviews2. Learnpolarityofeachphrase3. Rateareviewbytheaveragepolarityofitsphrases
Turney (2002):ThumbsUporThumbsDown?SemanticOrientationAppliedtoUnsupervisedClassificationofReviews
Extracttwo-wordphraseswithadjectives
FirstWord SecondWord ThirdWord (notextracted)
JJ NNorNNS anythingRB, RBR,RBS JJ NotNNnorNNSJJ JJ NotNNorNNSNNorNNS JJ NorNNnor NNSRB,RBR,orRBS VB,VBD,VBN,VBG anything
Howtomeasurepolarityofaphrase?
• Positivephrasesco-occurmorewith“excellent”• Negativephrasesco-occurmorewith“poor”• Buthowtomeasureco-occurrence?
Pointwise MutualInformation
•Mutualinformationbetween2randomvariablesXandY
•Pointwise mutualinformation:• Howmuchmoredoeventsxandyco-occurthaniftheywereindependent?
I(X,Y ) = P(x, y)y∑
x∑ log2
P(x,y)P(x)P(y)
PMI(X,Y ) = log2P(x,y)P(x)P(y)
Pointwise MutualInformation
•Pointwise mutualinformation:• Howmuchmoredoeventsxandyco-occurthaniftheywereindependent?
•PMIbetweentwowords:• Howmuchmoredotwowordsco-occurthaniftheywereindependent?
PMI(word1,word2 ) = log2P(word1,word2)P(word1)P(word2)
PMI(X,Y ) = log2P(x,y)P(x)P(y)
HowtoEstimatePointwise MutualInformation
•Querysearchengine(Altavista)•P(word)estimatedbyhits(word)/N•P(word1,word2)byhits(word1 NEAR word2)/N
• (MorecorrectlythebigramdenominatorshouldbekN,becausethereareatotalofNconsecutivebigrams(word1,word2),butkN bigramsthatarekwordsapart,butwejustuseNontherestofthisslideandthenext.)
PMI(word1,word2 ) = log2
1Nhits(word1 NEAR word2)
1Nhits(word1) 1
Nhits(word2)
Doesphraseappearmorewith“poor”or“excellent”?
Polarity(phrase) = PMI(phrase,"excellent")−PMI(phrase,"poor")
= log2hits(phrase NEAR "excellent")hits("poor")hits(phrase NEAR "poor")hits("excellent")!
"#
$
%&
= log2hits(phrase NEAR "excellent")
hits(phrase)hits("excellent")hits(phrase)hits("poor")
hits(phrase NEAR "poor")
= log2
1N hits(phrase NEAR "excellent")1N hits(phrase) 1
N hits("excellent")− log2
1N hits(phrase NEAR "poor")1N hits(phrase) 1
N hits("poor")
Phrasesfromathumbs-upreview
Phrase POStags Polarity
online service JJNN 2.8
onlineexperience JJNN 2.3
directdeposit JJNN 1.3
localbranch JJNN 0.42…
lowfees JJNNS 0.33
trueservice JJNN -0.73
other bank JJNN -0.85
inconveniently located JJNN -1.5
Average 0.32
Phrasesfromathumbs-downreview
Phrase POStags Polarity
directdeposits JJNNS 5.8
onlineweb JJNN 1.9
veryhandy RB JJ 1.4…
virtual monopoly JJNN -2.0
lesserevil RBRJJ -2.3
otherproblems JJNNS -2.8
low funds JJNNS -6.8
unethical practices JJNNS -8.5
Average -1.2
ResultsofTurney algorithm
• 410reviewsfromEpinions• 170(41%)negative• 240(59%)positive
• Majorityclassbaseline:59%• Turney algorithm:74%
• Phrasesratherthanwords• Learnsdomain-specificinformation
UsingWordNet tolearnpolarity
• WordNet:onlinethesaurus(coveredinlaterlecture).• Createpositive(“good”)andnegativeseed-words(“terrible”)• FindSynonymsandAntonyms
• PositiveSet:Addsynonymsofpositivewords(“well”)andantonymsofnegativewords
• NegativeSet:Addsynonymsofnegativewords(“awful”)andantonymsofpositivewords(”evil”)
• Repeat,followingchainsofsynonyms• Filter
S.M.KimandE.Hovy.2004.Determiningthesentimentofopinions.COLING2004M.HuandB.Liu.Miningandsummarizingcustomerreviews.InProceedingsofKDD,2004
SummaryonLearningLexicons
•Advantages:• Canbedomain-specific• Canbemorerobust(morewords)
• Intuition• Startwithaseedsetofwords(‘good’,‘poor’)• Findotherwordsthathavesimilarpolarity:
• Using“and”and“but”• Usingwordsthatoccurnearbyinthesamedocument• UsingWordNet synonymsandantonyms
• Useseedsandsemi-supervisedlearningtoinducelexicons
OtherSentimentTasks
• Importantforfindingaspectsorattributes• Targetofsentiment
• The food was great but the service was awful
Findingaspect/attribute/targetofsentiment
• Frequentphrases+rules• Findallhighlyfrequentphrasesacrossreviews(“fish tacos”)• Filterbyruleslike“occursrightaftersentimentword”
• “…great fish tacos”meansfish tacos alikelyaspect
Casino casino,buffet,pool,resort,bedsChildren’s Barber haircut,job,experience,kidsGreekRestaurant food,wine,service,appetizer,lambDepartmentStore selection,department,sales,shop,clothing
M.HuandB.Liu.2004.Miningandsummarizingcustomerreviews.InProceedingsofKDD.S.Blair-Goldensohn,K.Hannan,R.McDonald,T.Neylon,G.Reis,andJ.Reynar.2008.BuildingaSentimentSummarizerforLocalServiceReviews.WWWWorkshop.
Findingaspect/attribute/targetofsentiment
• Theaspectnamemaynotbeinthesentence• Forrestaurants/hotels,aspectsarewell-understood• Supervisedclassification
• Hand-labelasmallcorpusofrestaurantreviewsentenceswithaspect• food,décor,service,value,NONE
• Trainaclassifiertoassignanaspecttoasentence• “Giventhissentence,istheaspectfood,décor,service,value,or NONE”
Puttingitalltogether:Findingsentimentforaspects
ReviewsFinalSummary
Sentences&Phrases
Sentences&Phrases
Sentences&Phrases
TextExtractor
SentimentClassifier
AspectExtractor
Aggregator
S.Blair-Goldensohn,K.Hannan,R.McDonald,T.Neylon,G.Reis,andJ.Reynar.2008.BuildingaSentimentSummarizerforLocalServiceReviews.WWWWorkshop
ResultsofBlair-Goldensohn etal.method
Rooms (3/5stars,41comments)(+) Theroomwascleanandeverythingworkedfine– eventhewaterpressure...(+)Wewentbecauseofthefreeroomandwaspleasantlypleased...(-)…theworsthotelIhadeverstayedat...
Service (3/5stars,31comments)(+)Uponcheckingoutanothercouplewascheckingearlyduetoaproblem...(+)Everysinglehotelstaffmembertreatedusgreatandansweredevery...(-)ThefoodiscoldandtheservicegivesnewmeaningtoSLOW.
Dining (3/5stars,18comments)(+)ourfavoriteplacetostayinbiloxi.thefoodisgreatalsotheservice...(+)OfferoffreebuffetforjoiningthePlay
SummaryonSentiment
•Generallymodeledasclassificationorregressiontask• predictabinaryorordinallabel
•Features:• Negationisimportant• Usingallwords(innaïvebayes)workswellforsometasks• Findingsubsetsofwordsmayhelpinothertasks
• Hand-builtpolaritylexicons• Useseedsandsemi-supervisedlearningtoinducelexicons
Emotions
Scherer’stypologyofaffectivestatesEmotion:relativelybriefepisodeofsynchronizedresponseofallormostorganismicsubsystemsinresponsetotheevaluationofaneventasbeingofmajorsignificance
angry,sad,joyful,fearful,ashamed,proud,desperateMood:diffuseaffectstate…changeinsubjectivefeeling,oflowintensitybutrelativelylongduration,oftenwithoutapparentcause
cheerful,gloomy,irritable,listless,depressed,buoyantInterpersonalstance:affectivestancetakentowardanotherpersoninaspecificinteraction,coloringtheinterpersonalexchange
distant,cold,warm,supportive,contemptuous
Attitudes:relativelyenduring,affectivelycoloredbeliefs,preferencespredispositionstowardsobjectsorpersons
liking,loving,hating,valuing,desiringPersonalitytraits:emotionallyladen,stablepersonalitydispositionsandbehaviortendencies,typicalforaperson
nervous,anxious,reckless,morose,hostile,envious,jealous
Twofamiliesoftheoriesofemotion
• Atomicbasicemotions• Afinitelistof6or8,fromwhichothersaregenerated
• Dimensionsofemotion• Valence(positivenegative)• Arousal(strong,weak)• Control
Ekman’s6basicemotions:Surprise,happiness,anger,fear,disgust,sadness
Valence/ArousalDimensions
Higharousal,lowpleasure Higharousal,highpleasureanger excitement
Lowarousal,lowpleasureLowarousal,highpleasuresadness relaxation
arou
sal
valence
Atomicunitsvs.Dimensions
Distinctive• Emotionsareunits.• Limitednumberofbasicemotions.• Basicemotionsareinnateanduniversal
Dimensional• Emotionsaredimensions.• Limited#oflabelsbutunlimitednumberofemotions.
• Emotionsareculturallylearned.
AdaptedfromJuliaBraverman
Oneemotionlexiconfromeachparadigm!
1. 8basicemotions:• NRCWord-EmotionAssociationLexicon(MohammadandTurney 2011)
2. Dimensionsofvalence/arousal/dominance• Warriner,A.B., Kuperman,V.,andBrysbaert,M.(2013)
• BothbuiltusingAmazonMechanicalTurk
Plutchick’s wheelofemotion
• 8basicemotions• infouropposingpairs:
• joy–sadness• anger–fear• trust–disgust• anticipation–surprise
NRCWord-EmotionAssociationLexiconMohammadandTurney 2011
• 10,000wordschosenmainlyfromearlierlexicons• LabeledbyAmazonMechanicalTurk• 5Turkers perhit• GiveTurkers anideaoftherelevantsenseoftheword• Result:
amazingly anger 0amazingly anticipation 0amazingly disgust 0amazingly fear 0amazingly joy 1amazingly sadness 0amazingly surprise 1amazingly trust 0amazingly negative 0amazingly positive 1
TheAMTHit
…
Lexiconofvalence,arousal,anddominance
• Warriner,A.B., Kuperman,V.,andBrysbaert,M.(2013). Normsofvalence,arousal,anddominancefor13,915Englishlemmas. BehaviorResearchMethods45,1191-1207.
• Supplementarydata: Thisworkislicensedundera CreativeCommonsAttribution-NonCommercial-NoDerivs3.0UnportedLicense.
• Ratingsfor14,000wordsforemotionaldimensions:• valence (thepleasantnessofthestimulus)• arousal (theintensityofemotionprovokedbythestimulus)• dominance (thedegreeofcontrolexertedbythestimulus)
Lexiconofvalence,arousal,anddominance• valence (thepleasantnessofthestimulus)
9:happy,pleased,satisfied,contented,hopeful1:unhappy,annoyed,unsatisfied,melancholic,despaired,orbored
• arousal (theintensityofemotionprovokedbythestimulus)9:stimulated,excited,frenzied,jittery,wide-awake,oraroused1:relaxed,calm,sluggish,dull,sleepy,orunaroused;
• dominance (thedegreeofcontrolexertedbythestimulus)9:incontrol,influential,important,dominant,autonomous,orcontrolling1:controlled,influenced,cared-for,awed,submissive,orguided
• AgainproducedbyAMT
Lexiconofvalence,arousal,anddominance:Examples
Valence Arousal Dominancevacation 8.53 rampage 7.56 self 7.74happy 8.47 tornado 7.45 incredible 7.74whistle 5.7 zucchini 4.18 skillet 5.33conscious 5.53 dressy 4.15 concur 5.29torture 1.4 dull 1.67 earthquake 2.14
Lexiconsfordetectingdocumentaffect:Simplestunsupervisedmethod
• Sentiment:• Sumtheweightsofeachpositivewordinthedocument• Sumtheweightsofeachnegativewordinthedocument• Choosewhichevervalue(positiveornegative)hashighersum
• Emotion:• Dothesameforeachemotionlexicon
Lexiconsfordetectingdocumentaffect:Simplestsupervisedmethod
• Buildaclassifier• Predictsentiment(oremotion,orpersonality)givenfeatures• Use“countsoflexiconcategories”asafeatures• Samplefeatures:
• LIWCcategory“cognition”hadcountof7• NRCEmotioncategory“anticipation”hadcountof2
• Baseline• Insteadusecountsofall thewordsandbigramsinthetrainingset• Thisishardtobeat• Butonlyworksifthetrainingandtestsetsareverysimilar
Personality
Scherer’stypologyofaffectivestatesEmotion:relativelybriefepisodeofsynchronizedresponseofallormostorganismicsubsystemsinresponsetotheevaluationofaneventasbeingofmajorsignificance
angry,sad,joyful,fearful,ashamed,proud,desperateMood:diffuseaffectstate…changeinsubjectivefeeling,oflowintensitybutrelativelylongduration,oftenwithoutapparentcause
cheerful,gloomy,irritable,listless,depressed,buoyantInterpersonalstance:affectivestancetakentowardanotherpersoninaspecificinteraction,coloringtheinterpersonalexchange
distant,cold,warm,supportive,contemptuous
Attitudes:relativelyenduring,affectivelycoloredbeliefs,preferencespredispositionstowardsobjectsorpersons
liking,loving,hating,valuing,desiringPersonalitytraits:emotionallyladen,stablepersonalitydispositionsandbehaviortendencies,typicalforaperson
nervous,anxious,reckless,morose,hostile,envious,jealous
Personality
• Theinternalstructuresandpropensitiesthatexplainaperson’scharacteristicpatternsofthought,emotion,andbehavior.
• Personalitycaptureswhatpeoplearelike.
McGraw-Hill/IrwinChapter9
90
TheBigFiveDimensionsofPersonality
Extraversionvs.Introversionsociable,assertive,playfulvs.aloof,reserved,shy
Emotionalstabilityvs.Neuroticismcalm,unemotionalvs.insecure,anxious
Agreeablenessvs.Disagreeablefriendly,cooperativevs.antagonistic,faultfinding
Conscientiousnessvs.Unconscientiousself-disciplined,organised vs.inefficient,careless
Opennesstoexperienceintellectual,insightfulvs.shallow,unimaginative
BigFivePersonality:Agreeableness
warm,kind,cooperative,sympathetic,helpful,andcourteous.• Strongdesiretoobtainacceptanceinpersonalrelationshipsasameansofexpressingpersonality.
• Agreeablepeoplefocuson“gettingalong,”notnecessarily“gettingahead.”
McGraw-Hill/IrwinChapter9
BigFivePersonality:Extraversion
talkative,sociable,passionate,assertive,bold,anddominant• Easiesttojudgeimmediatelyonfirstmeeting• Prioritizedesiretoobtainpowerandinfluencewithinasocialstructureasameansofexpressingpersonality.
• Highinpositiveaffectivity— atendencytoexperiencepleasant,engagingmoodssuchasenthusiasm,excitement,andelation.
McGraw-Hill/IrwinChapter9
BigFivePersonality:Neuroticism• experienceunpleasantmoods:hostility,nervousness,andannoyance.• morelikelytoappraiseday-to-daysituationsasstressful.• lesslikelytobelievetheycancopewiththestressorsthattheyexperience.• relatedtolocusofcontrol (attributecausesofeventstothemselvesortotheexternalenvironment)
• Neurotics:externallocusofcontrol:believethattheeventsthatoccuraroundthemaredrivenbyluck,chance,orfate.
• lessneuroticpeopleholdinternallocusofcontrol:believethattheirownbehaviordictatesevents.
McGraw-Hill/IrwinChapter9
ExternalandInternalLocusofControl
McGraw-Hill/IrwinChapter9
BigFivePersonality:OpennesstoExperience
curious,imaginative,creative,complex,sophisticated• Alsocalled“Inquisitiveness”or“Intellectualness”• highlevelsofcreativity,thecapacitytogeneratenovelandusefulideasandsolutions.
• Highlyopenindividualsaremorelikelytomigrateintoartisticandscientificfields.
McGraw-Hill/IrwinChapter9
ChangesinBigFiveDimensionsOvertheLifeSpan
McGraw-Hill/IrwinChapter9
Aside:DoAnimalsHavePersonalities?
• Gosling(1998)studiedspottedhyenas.• 4humanobserversrated44personalitytraitsofhyenas• RanPCAontheratings• Fivedimensions:Assertiveness,Excitability,Human-DirectedAgreeableness,Sociability,andCuriosity
• Relatedto3humandimensions:neuroticism(excitability),openness(curiosity),agreeableness(sociability+agree)
Varioustextcorporalabeledforpersonalityofauthor
Pennebaker,JamesW.,andLauraA.King.1999."Linguisticstyles:languageuseasanindividualdifference."Journalofpersonalityandsocialpsychology 77,no.6.
• 2,479essaysfrompsychologystudents(1.9millionwords),“writewhatevercomesintoyourmind”for20minutes
Mehl,MatthiasR,SDGosling,JWPennebaker.2006.Personalityinitsnaturalhabitat:manifestationsandimplicitfolktheoriesofpersonalityindailylife.Journalofpersonalityandsocialpsychology90(5),862
• SpeechfromElectronicallyActivatedRecorder(EAR)• Randomsnippetsofconversationrecorded,transcribed• 96participants,totalof97,468wordsand15,269utterances
Schwartz,H.Andrew,JohannesC.Eichstaedt,MargaretL.Kern,LukaszDziurzynski,StephanieM.Ramones,Megha Agrawal,AchalShahetal.2013."Personality,gender,andageinthelanguageofsocialmedia:Theopen-vocabularyapproach."PloS one 8,no.9
• Facebook• 75,000volunteers• 309millionwords• Alltookapersonalitytest
Ears(speech)corpus(Mehl etal.)
Essayscorpus(Pennebaker andKing)
Classifiers
• Mairesse,François,MarilynA.Walker,MatthiasR.Mehl,andRogerK.Moore."Usinglinguisticcuesfortheautomaticrecognitionofpersonalityinconversationandtext."Journalofartificialintelligenceresearch(2007):457-500.
• Variousclassifiers,lexicon-basedandprosodicfeatures
• Schwartz,H.Andrew,JohannesC.Eichstaedt,MargaretL.Kern,LukaszDziurzynski,StephanieM.Ramones,Megha Agrawal,Achal Shahetal.2013."Personality,gender,andageinthelanguageofsocialmedia:Theopen-vocabularyapproach."PloS one 8,no.
• regressionandSVM,lexicon-basedandall-words
SampleLIWCFeaturesLIWC(LinguisticInquiryandWordCount)Pennebaker,J.W.,Booth,R.J.,&Francis,M.E.(2007).LinguisticInquiryandWordCount:LIWC2007.Austin,TX
Facebookstudy,Learnedwords,ExtraversionversusIntroversion
Facebookstudy,LearnedwordsNeuroticismversusEmotionalStability