The Application of Sentiment Analysis and Text Analytics ... ·...
Transcript of The Application of Sentiment Analysis and Text Analytics ... ·...
DOI: 10.4018/IJDWM.2019100102
International Journal of Data Warehousing and MiningVolume 15 • Issue 4 • October-December 2019
Copyright©2019,IGIGlobal.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIGIGlobalisprohibited.
21
The Application of Sentiment Analysis and Text Analytics to Customer Experience Reviews to Understand What Customers Are Really SayingConor Gallagher, Letterkenny Institute of Technology, Donegal, Ireland
Eoghan Furey, Letterkenny Institute of Technology, Donegal, Ireland
Kevin Curran, Ulster University, Derry, UK
ABSTRACT
In a world of ever-growing customer data, businesses are required to have a clear line ofsight into what their customers think about the business, its products, people and how ittreats them.Insight into thesecriticalareasforabusinesswillaid in thedevelopmentofarobustcustomerexperiencestrategyandinturndriveloyaltyandrecommendationstoothersbytheircustomers.It iskeyforbusiness toaccessandminetheircustomerdata todriveamoderncustomerexperience.Thisarticleinvestigatestheuseofatextminingapproachtoaidsentimentanalysisinthepursuitofunderstandingwhatcustomersaresayingaboutproducts,servicesandinteractionswithabusiness.ThisiscommonlyknownasVoiceoftheCustomer(VOC)dataanditiskeytounlockingcustomersentiment.Theauthorsanalysetherelationshipbetweenunstructuredcustomersentimentintheformofverbatimfeedbackandstructureddataintheformofuserreviewratingsorsatisfactionratingstoexplorethequestionofwhethercustomerssaywhattheyreallythinkwhengiventheopportunitytoprovidefreetextfeedbackas opposed to how they rate a product on a scale of one to five. Using various SentimentAnalysisapproaches,theauthorsassignasentimentscoretoapieceofverbatimfeedbackandthencategoriseitaspositive,negative,orneutral.Usingthisnormalisedsentimentscore,theycompareittothecorrespondingratingscoreandinvestigatethepotentialbusinessinsights.Theresultsobtainedindicatethatabusinesscannotrelysolelyonastandalonesinglemetricasasourceoftruthregardingcustomerexperience.Thereisasignificantdifferencebetweenthecustomer ratingsscoreand thesentimentof theircorrespondingreviewof theproduct.Theauthorsproposethatitisimperativethatabusinesssupplementstheircustomerfeedbackscoreswitharobustsentimentanalysisstrategy.
KEyWoRDSMachine Learning, Sentiment Analysis, Text Analysis, Text Mining
International Journal of Data Warehousing and MiningVolume 15 • Issue 4 • October-December 2019
22
1. INTRoDUCTIoN
Increasingly,leadingmodernbusinessesarelookingtogainmoreinsightsfromcustomerverbatimdatatheycollect.Unfilteredcustomerfeedbackprovidesatremendousopportunitytolearnmoreaboutcustomersentimentinrelationtoproductsandtheirend-to-endexperiencewithacompany.Italsogivesthebusinessanopportunitytounderstandhowtheycan‘close-the-loop’onanypoorfeedbacktheymayreceiveandconverselyhowtheycancapitaliseonpositivefeedbackfromcustomers.Oneofthedrawbacksofonlinecustomerverbatimisitsqualityandquantityofdata(Maritz,2018).Howdobusinessesmineandanalysedatatoensureitrevealskeyinsightsthatcanultimatelydriveprofitsintherightdirection.Onceabusinessrecognisesthebenefitofanalysingitscustomerfeedbackdata,thequestionbecomes,how?Identifyingcustomerverbatimasunstructureddataisthefirststepandsecondlyusinga‘BigData’approachsuchastextminingwillallowthebusinesstoemployamoresophisticatedandadvancedmodellingtechniquetouncoverpatternsinthedata,employingsentimentanalysistoidentifykeythemesfromthefeedback.However,businesseswillcontinuetousestructuredquantitativedatagatheringtechniquessuchasaskingcustomerstoratecustomer/clientinteractions,productsandemployeesonvariousscalesi.e.1-5Stars,0–11-pointscalesandsoon.Thesemethodscannotbesolelyreplieduponanditiskeyforabusinesstocriticallyanalyseitsunstructureddatawhetherthanbeonsocialmediachannels,ad-hocemailstothebusiness,lettersandcustomersurveys.
Allbusinessesneedcustomerstoprosperandgrow.Theyarethemainsourceofrevenueformostbusinesses.Itcanbearguedthatthesuccessofabusinessisdirectlyproportionaltoitsabilitytoacquirecustomers(Smith,2016),keepcustomershappy,identifyissuesorirritantsandconsequentlydrivemoresellingopportunities.Butforabusinesstoachievethisitneedstoidentifythekeyindicatorsthatwillprovideinsightstofacilitatethis.Therearemanyvariablesthatabusinessneedstoidentify,thewho,what,why,andhow.Whoarethepotentialcustomersinthemarketplace?Whatdotheywant?Whydotheywantaproduct?Howasabusinesscantheyretaincustomersandgrowtheirmarketshare.Examplesofwhereabusinesscanutilisetheircustomerdataaresalesdemographics,productandchannelproductpreferences,socialmediaorwebsitesentimentandinteractionsandtransactionbehaviours. Customer analytics is becoming one of the key enablers available to companies tofacilitatethetranslationofrawdataintousefulinsightsabouttheircustomers.Thisresearchintendstohighlighttheimportanceofcustomeranalyticsinthedeliveryofcustomerinsightsbacktothebusinesstodrivedecisionmaking(Fiedler,etal.,2016).60%ofcompaniessaidthatorganisationalsiloswereamajorobstacletoimprovingcustomerexperience(Google,2016).Companiesarefindingitdifficulttounderstandwhatcustomerdatatheyhaveandoftenareunderutilisingthecustomerdatathattheydohave(DepartmentofIndustry,2018).ThemostsuccessfulcompanieshavefoundwaysaroundthisandareactivelyimplementingCustomerExperiencestrategiesi.e.buildingCustomerExperienceteamsthatareagileandcanpivotbasedonbusinessneedsandgoals(Hollyoake,2009).Whileanycompanycanusedatatoreportfinancialsordrivecostsavings,thekeydifferentiationcomesfromhowthisdatacanbeusedtodrivebusinessinsights.Whilebusinessesarequitegoodattrackingstructedfeedbacki.e.numericalratingssuchas1-5-starratingstheyneedtoimproveonunderstandingwhatcustomersaresayingabouttheirproductsandexperienceswiththecompany.Thisposesaseriouschallengeanddifficultyincreasesasabusinessgrowsitscustomerbase(Parasuraman,etal.,1991).Withthatinminditseemsimpossibleforabusinesstocontacteverypotentialandcurrentcustomerindividuallytounderstandtheirsentiment.Therearemanymethodsorchannelsinwhichcustomerscaninteractwithabusiness,forexample,socialmedia,email,customersatisfactionsurveys, registeringsatisfactionvia IVRs,capturing feedbackoncallsviaSpeechAnalytics.So,thechallengeforthebusinessishowtoaccuratelycapture,store,analyseandobtaininsightsfromthisdata.Thisbecomesmoredifficultasthecustomerbasegrows,anditseemsveryunlikelythatabusinesswillcreateatouchpointwitheverysinglecustomertheyhave.
Intoday’shigh-techmarketplace,customershavemoreoptionsforproductsandservicesthaneverbefore(International,2017).Thebarriersofchoicehavecomedownastechnologyandtheinternet
International Journal of Data Warehousing and MiningVolume 15 • Issue 4 • October-December 2019
23
hasadvancedandthisleavesthebusinesswithaconundrumofhowtoaccuratelypredictwhattheircustomerswilldointhefuturebeforetheircompetitors.Toachievethis,theyneedtopredictcustomerbehaviour.Thisiswherepredictiveanalyticsbecomestheanswer.Togetaheadofthecurveandultimatelytheircompetitors(LaValleetal.,2011)arguethatabusinessneedstoproactivelyemployadvanceddata analytics techniquesanddrivea strategy thatwill provide insight intowhat theircustomersmightdointhefuture.Whatproductsarecustomersbuying?Whatisthelikelihoodthatacustomerwillchurnandmovetoacompetitor?Arecustomerssatisfiedwiththeproducts,ifnotwhy?Arethereirritantsthatcustomersfindwhendealingwiththebusiness?Thereareamultitudeofquestionsthebusinesswillhave,andpredictiveanalyticscangosomewayinprovidinginsights.Usingadvanceddataanalyticstechniques,abusinesscanusetheircustomerdatatobuildpredictivemodels.Thesemodelshavethepotential topredictfuturecustomerbehaviour.Itaidsbusinessestoidentifycustomersatriskofleaving,potentialissueswithproductsorservices.Withpredictiveanalytics,businessescangainacompetitiveadvantageoverrivalsifusedproperly.Thereisatrendtowardsprescriptiveanalyticsandamoveawayfromdescriptiveanalytics(S.Kaisler,2013).Thisallowsthebusinesstofocusonwhatcouldhappen,basedonpredictivemodellingmethodsratherthanwhathasalreadyhappened.Thereisobviouslyarequirementfordescriptiveanalytics,businesseswillalwayshavetoreportwhathashappenedatacertainpointoftimehencetheneedtocapturemetricsandKPIsbutthenotionofutilizingthisdatatopredictwhatwillhappenshowsaforward-thinkingmindset(Steger,2013).Theopportunityforexploringpredictivecustomeranalyticstodayisbetterthaneverbefore.Datasourcessuchascustomersatisfactionsurveys,socialmediachannels,chatbots,voicecallsusingspeechanalyticsallprovidearichsourceofopportunity.AndnaturallyBigDatatechnologyhaskeptpacewiththeexponentialgrowthindataandiscapableofhandlinglarge-scaledataprocessing,storageandintegrationneededtocapitalizeontheseuncoveredinsights(EMC,n.d.)Tobegintounderstandwhatthecustomersthinkandfeelabusinessneedstounderstandthetypeofdataithasandbuildthetechnologicalinfrastructuretobeabletominethedatatobuildeffectivemodels(McKinsey,2016).Onlyatthispointcanthebusinessdevelopthesemodelseffectivelyanddrivetheiranalyticsstrategyforward
WeexamineifitispossibleforabusinesstorelysolelyonasinglepointCustomerSatisfactionmetricthroughthevalueofsentimentcontainedwithincustomerfeedbackasanindicatorofcustomersatisfactionand/oradvocacywithacompanyorproduct.Essentially,weinvestigateifacustomersentimentprovidesatruerreflectionofthecustomersexperiencewithaserviceorproduct.
2. RELATED WoRK
Textanalyticsprovidesasetoftechniquestomaketrendsvisible,dynamic,andtorevealrelationshipsbetweendatapointsthatmayhavebeendifficulttouncoverbyahumanduetothescaleandquantityofthedata(Miner,2012).Numericratingsremainthemostgenericformofdatacollectionincustomersurveysandunstructuredfeedbacksuchascustomerverbatimcontinuestoincreaseinimportance.Ratherthanattemptingtomanuallyreviewreamsofcustomercomments,toolsliketextanalyticsandspeechanalyticsallowcompaniestoturnthisverbatimfeedbackintovaluableinsightstheycananalyseandmeasuremoreefficientlythanmanuallyreviewingthedata(Verint,2016).TheendtoendTextminingprocesshasmanystagesallofwhicharekeytotheendgoalofderivinginsightsfromthetextualdata.Figure1givesanideaofhowthiswouldlookasaprocessflow.
Itiskeythatthedataneedstobeinaprocessableformat.Multipledatasourcescanbeused,alargeamountoftextualdatamaybeavailableviafreetextformatonsocialmediaplatforms,customersurveysandspeechanalyticstoolswouldalsoconvertconversationsintoarawformat.Preliminaryformattingispreparingthedatabyremovingdirtyorerroneousdata.Thisimprovesthedataandlessenthechanceoferrorsalthoughcanbetimeconsuming(NatarajanK.,2010).Otherformattingprocedurescanbeapplied to thedataas required.Textualdatacontains lotsof smallyethighlyfrequently-usedwords:“at,”“the,”“or,”“for”etc.Toanalysethefrequencyofwordsinatextblock,
International Journal of Data Warehousing and MiningVolume 15 • Issue 4 • October-December 2019
24
thesetypesofwordswillalwaysoccur.Stopwordremovalistheprocessoffilteringoutthesewordstolookforlonger,moresignificantwords.Pythonforexamplecomeswithbuilt-instopworddictionariesfordifferentlanguages(Upadhyay,2017).“Tokenizationlooksatthespacesbetweenthewordsaswordboundariesto‘cut’wordsfromthetextstringatthosepoints.”(Ott,2018).Tokenizationisacrucialstepintheprocessofstructuringtextforfurtheranalysis,asotherwisetextistreatedasanunstructured“string.”Theideaistobreakthestringintoindividualwordsortokenize,thesentenceorpieceoftext.Thisallowsforthedeterminationofwordfrequencyinthedocumentandbeginidentifyingrelationshipsbetweenwords.Theintentionforabusinessforexamplewouldbetoensurethatfrequently-usedwordsaretokenizedproperlywithaviewtofurtheranalysis.Mostoftext-miningtoolsallowuserstocreatecustomizedtokenizationrules.TheBag-of-wordsprocess,alsoknownasavectorspacemodel,breaksdownadocumentintoitsconstituentwords,regardlessofgrammarandorder.Thisisacrucialstepasthereisthelikelihoodofreceivingadditionaltextualdatawhichrequirebebrokendowninsimilarfashiontoallowforfurtheranalysis(RobertP.Schumaker,2012).Stemming(a.k.a.“lemmatization”)istheprocessbywhichwordsarereducedinadocumenttotheirbasicform.TakingtheEnglishlanguageasanexample,wordshavemanyformsratherthanonesingleformbasedontheirgrammaticalpurpose.Forinstance,“was,”“are,”“is,”etc.,areallformsoftheverb“tobe.”AswithTokenizationabove,onceyouperformaprocesslikestemmingthenthedataispreparedforfurtheranalyticswhichcanbeappliedtotextualdatatorecognizepatterns(Mejova,2012).OneexampleofanadvancedanalyticstechniqueassociatedwiththetextminingprocessisClusteranalysis.Inthecontextofthetextminingprocess,clusteranalysisinvolvesclassifyingwordsintogroups,suchthatwordsineachgrouparemorerelatedtooneanotherthantheyaretowordsinothergroups.(Kochut,etal.,2017)Thisisdonethroughtheapplicationofalgorithmssuchask-meansandx-meansclustering.UsingK-meansclusteringthevariable‘K’representsthenumberofclustersyouwant.Youcouldpotentiallysegmentdataintofourgroups,eightgroups—thenthealgorithmunderstandsthatandexecutesbasedoffthat‘K’number.Forx-meansthenumberofoptimalclustersisunknownandhereyouwouldspecifyamaxandminnumberofclusters.Thealgorithmwillthenrunananalysisandoutputtheoptimalnumberofclusters.Clusteringisapowerfulwaytoidentifyrelationshipsandtrendsincopiousamountsoftextualdata(Allahyari,2017).Itcanalsobeusedaspartofthesentimentanalysisprocess.DataVisualizationplaysakeyroleinallowingabusinessorresearchertoidentifykeytrendsorinsightsfromthedata.Therearemultiplemethodstovisualisedata,onecouldusetheinbuiltfeaturesofaprogramminglanguagee.g.Pythonhasmatplotlibforcreatingchartsadvisualsfromdata.ThereareavastrangeofDataVisualisationsoftwareavailablefromaBusinessintelligencepointofview,i.e.TableauisanindustryleaderinDataVisualisation(Ajenstat,2017),Microsoft’sofferingofPowerBIisanupandcomingchallengertoTableau(Molag,2017).Thesebusinessintelligencetoolscanlinktomultipledatasourcesi.e.excel,csv,SQLserver,jsonandinterestinglytheynowcanlinkdirectlytoPythonorRscripts,sothatuserscanrunscripts
Figure 1. End to End Text Mining Process (Harris, 2018)
International Journal of Data Warehousing and MiningVolume 15 • Issue 4 • October-December 2019
25
insidethetoolandthencreatevisualsandultimatelydashboardstoaidintheinsightexplorationofthedata.ThisisextremelybeneficialinabusinessscenariowherecompaniescanhavededicatedDataVisualizationanalystswhocreateandmaintainthesetypesofdashboards.Aspartofthisresearchtheauthorplanstoutiliseandpromotethebenefitsofvisualisingthedatasothatthereadercanunderstandhowthiswouldplayoutinabusinessenvironment.
Textminingcanbeutilizedtoevaluatecustomersatisfactionfromverbatimfeedbackratherthanrelyingsolelyonanumericalindicatorfromacustomersatisfactionsurvey-based(Hosseini,etal.,2010)proposedadata-miningmodelusingmonetary(RFM)attributesandK-meansclusteringtoevaluatecustomerloyaltyandcustomerlifetimevalue(CLV).Theresearchinvestigatedthedegreetowhichcustomerloyaltyinfluencedbottomlineprofitsforanorganization.Themodeldevelopedshowedaclearimprovementintheaccuracyofthecustomerloyaltymeasurement.Thisstudyhadlimitationsasthemodelreliedtooheavilyoncombiningdata-miningtechniquesaloneanddidnotconsider the impactof textualdata suchasverbatimcomments fromcustomers,which arenowgeneratedatmanytouchpointsinthecustomer’sjourney.(Pang&Lee,2002)employedtextminingmethodstoextractinsightsfromcustomerdatausingsentimentanalysistotrytoimprovecustomerloyaltymeasurement.Anotherexampleofusingtextanalyticstounderstandthedimensionsofproductalignmenttogaininsightintobrandpositioningwasusedby(Tirunillai&Tellis,2014).(Ordenes,etal.,2014)foundthattextanalyticscouldbeusedtoanalysecustomerfeedbacktocaptureinsightsofthecustomerexperience.Thismodelcapturedcustomersentimentfromthedataandtaggeditaseithercomplaints,complimentsorsuggestions/feedback.Thisallowedthemtoproposeinsightstosuggestefficienciesforthecompanyanditsresources.Theirresearch,howeverdidnotcombinethequalitativedataandquantitativedatatoassessandpredictcustomerloyalty.Wheretheresearchprovedusefulwasinrelationtothecomputationmethodsappliedthatcouldproducefindingsthatcanbeexpressedquantitatively,andthisforexampleaidsthedesiredoutcometoconductstatisticalmodelling,datavisualization,andmachinelearningalgorithmstoderivefurtherinsights.
Businesseshave twokindsof customer feedbackdata that they store, analyse andmeasure:structuredandunstructureddata.Structureddataisinformationthatisclearlydefinedandeasytoreporton.Itisthekindofdatathatisgenerallyfoundinasurveyandcanbeorganizedinaspreadsheet:name,location,age,andrating(3outof5stars,forexample,ora10for“mostsatisfied”versusa1for“leastsatisfied”)(Taylor,2018).UnstructureddataasitexiststodayinthecontextofCustomerexperiencedatais,basically,textorfreeformcustomerverbatim,althoughitcanalsoincludeothermediasuchasaudio,photos,orvideos.Unstructureddatacanbecapturedinanemail,the“additionalcomments”sectionofasurvey,voicerecordingsofcustomerinteractionse.g.SpeechAnalyticstools,apostonacustomerreviewsitee.g.Amazon,insocialmedia,andinamultitudeofotherplaces.(Taylor,2018).Analysingallthisdatacorrectlyiscritical,becauseitrevealseverythingfrombuyingtrendstoproductflawsandprovidesasignificantbusinessadvantage.(Aswani,2017)Organizationsoftenstruggletodothisanalysis,however,becauseunstructureddataissignificantlyhardertocategorizeandreportonthanstructureddata.Itcanbehardtoparseduetogrammaticalerrorsorslang, itfrequentlycontainsmultipleunrelatedideas,anditcanrepresentvariouslevelsofsentimentrelatedtoeachidea(e.g.“Icouldn’tcompletetheregistrationprocessonline,Ididn’tunderstandthewebsite,buttherepwasverypatientandtalkedmethroughtheprocess.”).Unstructureddatais,byitsnaturalhardertonavigatethanstructureddata.Anecdotallyitisthoughtthat95%ofcustomerfeedbackdataisunstructured(Kemper,2008)anditcontainsawealthofinformationtoaidintheunderstandingofcustomersentimentandopinion.Dauntingasitmayseemtobusinesses,thistypeoffeedbackispredictedtocontinuetogrow,asmillionsofpeopleincreasetheiruseofonline(Clarabridge,2015).Customersatisfactionisoneofthemostcriticalissuesconcerningbusinessorganizationofalltypes,whichisjustifiedbythecustomer-orientedphilosophyandtheprinciplesofcontinuesimprovementinmodernenterprise(Arokiasamy,2013)seeFigure2.
NetPromoterScore(NPS)isameasureofcustomerloyaltyandisoftenheldastheleadingcustomerexperiencemetric.NPS is still apopularcustomer loyaltymeasurementdespite recent
International Journal of Data Warehousing and MiningVolume 15 • Issue 4 • October-December 2019
26
studiesarguingthatcustomerloyaltyismultidimensional.Therefore,firmsrequirenewdata-drivenmethodsthatcombinebehaviouralandattitudinaldatasources.Measuredona0-10scale,itisbrokendownwhereascoreof0–6isclassifiedasDetractors,unhappycustomershowareveryunlikelytorecommendthebusinesstofriendsorfamily.Ascoreof7–8isclassifiedasPassives,thosecustomerswhoarehappyenoughwiththeservicebutnotenoughtoactivelyencourageothersandascoreof9–10isclassifiedasPromoters,thosecustomerswhoareloyalandwouldrecommendthebusinessorproducttofriendsandfamily(Qualtrics,2018).
Sentiment analysis plays a key role in the development of predictive and machine learningmodels.Ithasdevelopedintoawiderall-encompassingprocessthatinvolvesanalysingtextualdataandapplyinga‘sentimentscore’tothattext(Mejova,2012)Oneofthemostwidelyusedapproachesisusingmachinelearningstrategiestodevelopandbuildanalgorithmusingatrainingdatasetbeforeapplyingatestor‘real’dataset.Machinelearningtechniquesfirsttrainsthealgorithmwithsomeinputswithknownoutputssothatlateritcanworkwithnewunknowndata(Devika,etal.,2016).SVMorSupportVectorMachinesaredefinedasasupervisedmachinelearningalgorithmnormallyusedforregressionandclassificationprocesses.SVMusesanon-probabilisticclassifierinwhichalargeamountoftrainingsetisrequired.Itisdonebyclassifyingpointsandidentifyinga(d-1)-dimensionalhyperplane.SVMlooksforthehyperplanethatbestsegregatestheclasses(Devika,etal.,2016).SupportVectorMachinesmakeuseoftheconceptofdecisionplanesthatdefinedecisionboundaries.Adecisionplaneisonethatbestdifferentiatesbetweenasetofobjectshavingdifferentclassmembership.SupportVectorMachines are considered apowerful classification algorithm.Whenusedinconjunctionwithothertechniquessuchasrandomforestandothermachinelearningtools,theygiveaverydifferentaspecttoensemblemodels(Srivasta,2014).ThereisanargumentthatSVMdonotperformwellwithlargerdatasetsduetothehightrainingtimerequired,theydohoweverperformwellwithclearmarginofseparationsbetweenobjects(Brownlee,2016).
Inthecontextofsentimentanalysisortextmining,ann-gramisaneighbouringsequenceofnitemsfromagivensequenceoftextorspeech.NItemscanbeletters,words,pairsofwordsforexampleandaretypicallycollectedfromasequenceoftextorspeechcorpus.Figure3showshowasentencecanbebrokenupintovariousformatsi.e.words,pairsofwords,etc.N-gramsentimentanalysisconsidersthesentenceasawhole(Ghiassietal.,2013).
InMachinelearning,NaïveBayesistermedasafamilyof“probabilisticclassifiers”basedontheuseofBayesTheorem.NaïveBayesclassifiersarehighlyscalableandaremainlyusedwhenthesizeofthetrainingsetisless.TheconditionalprobabilitythataneventXoccursgiventheevidenceYisdeterminedbyBayesruleshowninEquation(1).UnlikeNaïveBayes,MaximumEntropyClassifierdoesnotassumethatthefeaturesareconditionallyindependentofeachother.Uncertaintyismaximumforauniformdistributionandthemeasureofuncertaintyisknownasentropy.MaximumEntropyClassifiercanbeused tosolveavarietyof textminingscenariosparticularlysentimentanalysis
Figure 2. Basic American Customer Satisfaction Index model (ACSI, 2018)
International Journal of Data Warehousing and MiningVolume 15 • Issue 4 • October-December 2019
27
(Vryniotis,2013).MaximumEntropyClassifierusesaparameterizedsetofweightswhichwhencombined,jointhefeaturesthataregeneratedfromasetoffeaturesbyanencoding.Thisencodingthenmapseachpairoffeaturesetandlabelsthemtoavector(McCallumzy&Nigamy,1998).MEclassifiersareknownastheexponentialorlog-linearclassifiersandbecausetheyworkbyextractingsetsoffeaturesfromtheinputdata,combiningtheminalinearfashionandthenusingthissumastheexponent(DaumeIII&Marcu,2006).(Pang&Lee,2002)usedtheapproachthateachdocumentisrepresentedwithanarrayof1’sand0’sthatindicateifawordexistsinthetextofthedocument.ThisplaysontheBoW(BagofWords)theorycommonlyusedinNLPandtextmining.
P c xP x c P c
Posterior obability
Likelihood Class i
||
( ) = ( ) ( )↑
↓ ↓
Pr
Pr oor obability
edictor ior obability
P x
Pr
Pr Pr Pr↑
( )
P c x P x c P x c P x c P cn
| | | ... |( ) = ( )× ( )× × ( )× ( )1 2 (1)
TheK-NearestNeighbour(KNN)techniqueisbasedonthenotionthattheclassificationofaninstancewillbeinsomewaylikethosenearbyitinthevectorspacehencethetermnearestneighbour.(Srivastav&Singh,2014)researchedweightedk-NearestNeighbourwheretheyassumedweightagetothoseobjectsinthetrainingsetandthenusedtheseweightsfortheircalculationofsentimentoftextinwordbywordfashion.Inaweightedk-NNanalysis(Srivastav&Singh,2014)thesentencesofthetweetscapturedweretokenisedandthenstopwordswereremoved.Thealgorithmdevelopedapositivescoretoeachreviewafterthefirstparse.Forthesecondparsingprocessaneutralreviewisinput.ThisscoreisthenmodifiedifrequiredallowingforbetterpositivisedeterminationandcreatinganoutputfilethatconsistsofthereviewIDanditsassociatedpositivescore.Weightedcombinationsoffeaturesetscanbequiteeffectiveinthetaskofsentimentclassification,sincetheweightsoftheensemblerepresenttherelevanceofthedifferentfeaturesets(e.g.n-grams,POS,etc.)tosentimentclassification,insteadofassigningrelevancetoeachfeatureindividually(Xiaetal.,2011).(Balahur&Montoyo,2008) investigated thenotionof assigningproducts classes to textualdata sets andextractinggeneralfeaturesfromtheproductsandtheninturnassigningsentimenttothefeatures.
ARulebasedapproach(Liu,etal.,2005)toextractopinionfromatextualdatasetbydefiningvariousrulestokenizedeachsentenceineverydocumentofthedatasetandtestedforthepresenceofeachtokenorword.Ifthepresenceofthewordisconfirmedandhasapositivesentiment,thena
Figure 3. N-Gram example (Maiolo, 2015)
International Journal of Data Warehousing and MiningVolume 15 • Issue 4 • October-December 2019
28
“+1”ratingisappliedtoit.Eachpieceoftextstartswithaneutralscoreofzeroandwasconsideredpositive.In(Devika,etal.,2016)forexample,iftheinputsentencecontainsanywordwhichisnotpresentinthedatabasethenitwillbeaddedtothedatabase.Thisisasupervisedlearningapproachinwhich thesystemis trainedto learn ifanynewinput isgiven.Rule-basedapproaches tendtouseheuristics todetermine sentiments and linguistics research to analyse sentiments.They tendtohaveverypoorgeneralization,butwithinanarrowdomaintheytendtoperformwell(Medhatetal.,2014).Inmachinelearningthisisknownasoverfittingandtheaimistoavoidthisproblemandconverselywithrules-basedsystemsit’sconsideredlargelyunavoidable.(Devikaetal.,2016)employedamachinelearningdatadrivenapproachwhichusedalabelledcorpusoftextswiththeirrespectivesentimentsforprediction.
ALexiconbasedapproachusesdictionariesofwordstaggedwiththeirsemanticpolarityandsentimentstrength(Yadav&Elchuri,2013)Thesescoresarethenusedtocalculatethepolarityand/orsentimentofthetextualdata.Itissaidthatthistechniqueallowsforhighprecisionbutlowrecall.LexiconBasedtechniquesworkonaninferencethattheaggregatedsemanticpolarityofasentenceortokensisthesumofpolaritiesoftheindividualphrasesorwords.AttheROMIP2012conferencethelexicon-basedmethodproposedin(Chetviorkin&Loukachevitch,2013)wasused.Theauthorsemployedanemotionalresearchtechniqueforsentimentanalysisdictionariesforeachofthedomains.Usingappraisedwordsfromtheappropriatetrainingset,thedomaindictionarieswerereloadedwiththehighestweightedwords
3. METHoDoLoGy
Webuiltaclassificationmodelusingpredictivevariables.Thestepswere:
1. Applyalinguistictextminingapproach(Ordenesetal.,2014)todeterminethecategoryofeachcustomersreviewbyminingthereviewverbatimthusdividingcustomerreviewsintothreegroupsofpositive,neutralornegative.Thissentimentanalysiswillfacilitatetheprocessofmodellingthesentimentscoregeneratedagainstthecustomerratingscore,henceattemptingtopredictthelikelihoodthatthecustomerhasratedtheproduct1thru5.
2. ApplyaNaiveBayesMachineLearningAlgorithmtopredicttheratingofthereviewsasthisisfoundtoworkwellwiththetextdata.
3. Preparethetextualdataforvectorizationbyremovingthepunctuationsandthenconvertingintolowercaseandallstopwordsremovedfromthesentences.
4. ConvertReview’ textdata toa token(whichhaspunctuationsandstopwordsremoved).UsingScikit-learn’sCount-Vectorizerfunctionalitythetextisthenconvertedintoamatrixoftokencounts.Iteffectivelyisa2Dmatrixwhereeachrowisauniqueword,andeachcolumnisareview.
5. DevelopthepredictionmodelusingaNaïveBayesalgorithmtopredictcustomersatisfactionscoresaccuracy.
6. EvaluatethemodelbytestingthepredictedvaluesagainsttheactualratingsinthedatausingaconfusionmatrixandclassificationreportfromtheScikit-learntoolkit.Thedataisrunthroughthemodelandtheefficiencyofthemodelisnotedandcommentedupon.
7. Testtheoutputfromthemodeltoconfirmthatpredictionsareaccurateandlineupwiththeexpectedresults.Thisinvolvestestingrandomreviewsandnotingthepredictedoutcomefromthemodelandcomparingwiththeactualratinginthedata.
3.1. Data TransformationThedatasetwassourcedfromKaggle1.ItcomprisedmobilephonereviewsfromtheAmazonwebsite(n=413,840)withonly62nothavingaverbatimcommentinthe‘Review’column.Thekeycomponent
International Journal of Data Warehousing and MiningVolume 15 • Issue 4 • October-December 2019
29
welookedforwastherating(startratingor1-5ratingforexample)andacorrespondingreviewfromthecustomer.Wewantedawide-rangingvarianceinthetypesofreviewsleftbythecustomerssothatthereissomedegreeofsubjectivityasdifferingviewsareimportanttoaidthedesignanddevelopmentofthesentimentmodel.Therefore,thereisanexpectationtoseedifferentresultsfromthesentimentanalysisofthisdatasetcomparedtothatofdatasetsabouttechnologyproducts.ThevariableswereProductTitle,Brand,Price,Rating&Reviewtextandnumberofpeoplewhofoundthereviewhelpful.Pythonwasusedasithasaconsiderablevarietyoftextminingandsentimentanalysispackages(Bird,etal.,2009)andpackagesthatallowthevisualisationofthedataandithasasolidrangeofpredictiveanalyticspackages.
Transformingcustomersatisfactiondataintocustomerinsightsrequiresaformalsystembywhichmeasuresareincludedaspartofadatacollectionandanalyticsprocess.Theaimwastoinvestigatethecorrelationbetweenthesentimentanalysisscoreandthatofthecustomerratingscoreandpredictwithadegreeofcertaintythelikelihoodoftheratingbasedofthesentiment?Therefore,ratherthanrelyingonthesurvey-basedCSATmeasurement,weappliedbigdatatechniquestothemobilephonecustomerdatatoevaluatethelinkbetweensatisfactionandsentiment.Thebasisofthisanalysisistostudythecustomerfeedbackfromthe‘Review’columninthedatacontainingthetextualfeedback,tounderstandthesentimentbehindtheircommentaryandtoidentifykeythemesandwordswithinthisdata.TodothisthereareseveralstagesrequiredandthisfollowstheCross-IndustryProcessforDataMining(CRISP-DM)(Chapman,etal.,2000).TheCRISP-DMprocessisthe“defactostandardfordevelopingdataminingandknowledgediscoveryprojects”(Marbánetal.,2009).Thismulti-layeredresearchmethodologyhas5keycomponentsofwhichthisprocessutilizessomebutnotalltheelementsoftheframework:(Marbánetal.,2009).
CustomerswhopurchasedaphonefromAmazonwereinvitedtoratethephonei.e.CustomersSatisfactionscoreor‘Rating’asperthedata,provideareviewoftheproductviaafreeformtextboxi.e.‘Reviews’andaspertheamazonwebsiteotheruserscouldreacttothereviewand‘up-vote’thefeedbackgivenbythecustomerontheproduct.Thecustomerexperiencestrategyofcapturingdatalikethishasbeentoucheduponpreviouslybutitisrelevanttounderstandhowitisapplicableinabusinesscontext.Asabusinesstheremustbesomeunderstatingofhowcustomerreacttoproductsandinthisdatasettherearetwokeyvariablesthatprovideinsightintothisi.e.RatingandReviews.Wenotethatthisisanamazondatasetandnotcompanyspecificdatai.e.Appledonotownthisdataforexampleandwhilsttheycouldcarryoutanalysisonthedatadonothavesoleownershipofit.
Thedatasetcanbeconsideredlongitudinalcustomerdatacoveringattitudinaldataelicitedfromthecustomersurvey,whichincludesSatisfactionratings(‘Rating’)andqualitativecustomerverbatimcomments(‘Reviews’),behaviouraldata(transactionaldatai.e.price)andproductdataacrossmultipleproductsforallmajormobilephonecompanies.Thedatasourcedidnothaveanydate/timereferencesassociatedwiththerecords.Therefore,theopportunitytoconducttime-seriesanalysisisunavailable,thiswouldbeanareawheretheauthordeemsfutureworkwouldberecommended.Therewasatotalof413,840productreviewrecords.
Thekeypredictivevariablesinthedatasetsthatareconsideredforanalysiswere,RatingandReview. It shouldbenoted that therewillbedescriptiveanalysisperformedon theall theothervariableswithinthedatasettolookforadditionalinsightsfromthedata.Theintentionistoallowthebusinesstoaccuratelypredictthelikelihoodthatacustomerwillrateaproductthewaytheyhave.Theprocesswillinvolveidentifyingthosecustomerswhoratedtheproducts1thru5andthenusingapredictivemodellingtechniqueoutputtheaccuracyusingaclassificationreport.
3.2. AnalysisPearsonandSpearmancorrelationplotsascertainwhere therearecorrelationsbothpositiveandnegativeinthedataandthiswillthenhelpdecisionmakingeasierwhendecidingwhichvariablestolookataspartofthedescriptiveanalysisinlatersections.Datacorrelationshelptoidentifyrelatedvariablesandcanbeusedby investigating the relationshipbetween twoquantitative,continuous
International Journal of Data Warehousing and MiningVolume 15 • Issue 4 • October-December 2019
30
variables.Pearson’scorrelationcoefficient(r)isameasureofthestrengthoftheassociationbetweenthetwovariablesandtheintervallevelrangesbetween-1to+1.WeuseSpearman’scorrelationwhichisthenonparametricversionofthePearsoncorrelation.Spearman’scorrelationcoefficient,(ρ,alsosignifiedbyrs)measures thestrengthanddirectionofassociationbetween tworankedvariables(Hauke&Kossowski,2011).
ThefirstpartoftheanalysiswasdoneusingthematplotliblibraryandacomparisonwasrunbetweentheReviewandReviewlengthvariablestogetanunderstandingiftherewasanysignificantcorrelationbetweenthelengthofareviewgivenbythecustomerandthesubsequentrating.Histogramsweregeneratedrepresentingeachreviewcategory i.e.1 ->5showing thedistributionof reviewlengthsforeachrating.Acomparisonwascarriedouttoanalysehowtheratingandpriceaffectedeachother.UsingtheSeabornlibrary,ascattergraphwitharegressionlinewascreatedforeachratinginrelationtothecorrespondingprice.Forthisanalysistheheatmapfunctionalityfromtheseabornlibrarywasusedandthisproducedaheatmapgraphicthatgroupedthenumericalvariablesinthedata(Price,ReviewVotesandReviewLength)andassignedacolourandcorrelationscoretoeachrelationshipindicatingthestrengthofthecorrelation.Thisisusefulinthesensethatitallowsthereadertospotanyimportantpositiveornegativecorrelationsbetweenthedatapointsthatmaynothavebeenobviouswheneyeballingtherawdata.Basedontheresultsonecouldthendecideonhowbesttocontinueanalysingthedata.
Ascatterchartfromtheseabornlibrarywasusedinthisanalysistolookatthedistributionofreviewlengthsacrosseachrating.Inthisscenariothedataisplottedusingascatterplotandthereasonforthisistoidentifyifthereareanyoutliersinthedataacrossthe5ratings.ToinvestigatetheRatingsdistributionofthedataaKDE(KernelDensityEstimation)plotwasusedtovisualisetheratings.Instatistics,KDEisanon-parametricwaytoestimatetheprobabilitydensityfunctionofarandomvariable,inthiscasetheRatinggivenbycustomers.KDEattemptstosmooththedatawhereassumptionsaboutthepopulationaremadebasedonafinitedatasetasshownbelowinEquation(2).
f̂ xn
K x xnh
Kx x
hh h ii
ni
i
n
( ) = −( ) = −
= =
∑ ∑1 1
1 1
(2)
whereKisthekernel,anon-negativefunctionthatintegratestooneandh>0isasmoothingparametercalledthebandwidthInEquation(2),thekernelKwithsubscripthiscalledthescaledkernel.Ideallyonewouldlooktoselectthehvalueassmallasisfeasibleforthedata;however,thereisalwaysatrade-offbetweenthebiasoftheestimatoranditsvariance.Thebandwidth(h)controlshowtightlytheestimationisfittedtothedata,verymuchlikethebinsizeinahistogram.TheKDEplotisverysimilartoahistogram,itestimatestheprobabilitydensityofacontinuousvariable,inthiscasetheprobabilityofaratingfrom1to5.Inthiscaseaunivariateorkerneldensityestimateisplotted.AcomparisonbetweenAppleandSamsungwasplottedintwoseparategraphsallowingfortheanalysisofthedistributionofratingsbetweenthetwobrands.Asidebysidecomparisoncanbevaluableindetermininganyobviousdifferencesbetweenthebrandswithfutureanalysisinmind.StayingwithKDEplotsandusingSamsungagainasanexample,aplotwascreatedtovisualisethedistributionbyPriceoftheSamsungproductsusingtheSeabornlibraryasintheprevioustwosubsections.
3.3. Sentiment AnalysisUsing theTextBlob library,weconducted sentiment analysisoneachcomment for eachdatasetusingTextBlobandassignedeach‘Review’apolarityscoreofbetween-1 ->1.Theranges forPositive,NegativeandNeutralwerePositiveSentimentScore:>0.1to1,NeutralSentimentScore:between-0.1and0.1andNegativeSentimentScore:-1to<-0.1.Theserangesrepresentthevariouscategoriesofsentimentpolarity.Adataframenamed‘df_polarity_desc’wasusedtooutputatable
International Journal of Data Warehousing and MiningVolume 15 • Issue 4 • October-December 2019
31
withthecolumnsReview,SentimentandPolarity.Thisenabledthereadertoviewtheoutputfromthe sentiment analysis and to put some context around the actual Sentiment Polarity scores foreachindividualreview.Twoseparatewordcloudswerecreatedtovisualboth‘PositiveWords’and‘NegativeWords’.Both‘PositiveReviews’,‘NeutralReviews’and‘NegativeReviews’weredefinedbasedonthenumericalrangesdiscussed.ThisallowedtheauthortocategorisetheReviewsintothethree‘buckets’described.
UsingtheWordCloudfunction,twowordscloudswerevisualisedforbothPositiveandNegativereviewsi.e.themostusedwordsineachcategorisation.AwordcloudwithablackbackgroundwasproducedforthePositiveReviewwordsandaworldcloudwithawhitebackgroundforthenegativereviewwords.TheSentimentPolarityscorewasgeneratedona-1to1scalebasedonthecorresponding‘Review’.Theintentionistotestwhetherthesentimentpolarityscoreisrelatedtotheratingprovidedforthecorrespondingtextcommentfromthecustomer.ToachievethistheSentimentscorehasbeennormalisedfroma-1to1scaletoa1to5scaletomimictheratingsscaleavailabletothecustomer.Tonormalizethedata,Equation(3)wasusedandappliedtothe‘Polarity’scoreandthenormaliseddatawasnamed‘Polarity_Norm’.
xx x
x xnew=
−−min
max min
(3)
ThePolarityscoreswerenormalizedaspertheabovetoscoresbetween1and5toallowfordirectcomparisontothecorrespondingratingscoreOncethePolarityscorehasbeennormalised,themeanandstandarddeviationof‘Polarity_Norm’isusedtocreateseveralchartsanalysingtheprobabilitydistribution foreachRatingcategory.The twomain ratingcategories thathavebeenidentifiedthroughouttheresearchhavebeenRating1andRating5.Aplotofthepolaritydistributioniscreatedtoexplorethesentimentfromthereviewsintermsoftheactualratingandthisprovidesakeyinsightintothedata.
WordfrequenciesaregatheredfromtheReviewstextualdatasetintoadocument-termmatrixthusallowingtheextractionoffeaturesfromtheReviewssubdataset.Wecanemployafeatureextractiontechniquethatallowsfortransformingarbitrarydata,inthiscasetheReviewtext,intonumericalfeaturesusableformachinelearning.Thevectorizationprocessturnsthecollectionoftextdocumentsintonumerical featurevectors.This techniqueemploys tokenization,countingandnormalizationofthedataandiscommonlyreferredtoBagofWordsor“Bagofn-grams”representation.Using‘bow_transformer.vocabulary’ a classification technique is carried out on the data. Since thevectorizationprocessprovideddiscretefeaturesi.e.thewordcountsfromtheReviewdatathenextstepistoemployMultinomialNaïveBayeswiththefeaturecounts(Huang,2017).Thishelpsuscalculatetheprobabilityoftworatingscategories,Rating1andRating5,usingBayestheorem.UsingReviewsasthesecondvariableinthetesting,theprobabilityandaccuracyofpredictingaRatingof1and5basedonthereviewswillbeobtained.Todothisthedataissplitintoatestingandtrainingset.Wetake30%ofthedataandusethatasthetestingset.Usingaconfusionmatrix,wesummarisetheperformanceoftheclassificationalgorithm.Thenumberofcorrectandincorrectpredictionsaresummarizedwithcountvaluesandbrokendownbyeachclass.Weruntheconfusionmatrixfor1and5RatingsandthenallRatingsi.e.1thru5.Duetothenatureoftheconfusionmatrixitisexpectedthattheaccuracywilldropwhenthe5classesaremodelled.
4. EVALUATIoN
ThePearsonCorrelationshowedaslightcorrelationbetweenRatingandReviewVotes(seeFigure4).ThiswouldsuggestthatastheRatingincreasesthepriceincreasesandthenumberofReviewvotes
International Journal of Data Warehousing and MiningVolume 15 • Issue 4 • October-December 2019
32
giventotheproductincreasesalso.Whichwouldmakesensebutsincethereisnotastrongcorrelationthe thought is that thiswouldbenotworthexploringwithfurtheranalysis.Using theSpearmancorrelationinFigure5,onecanobserveverysimilarcorrelationswithaslightnegativecorrelationbetweenRatingandReviewvoteswhich,aswiththePearsontest,doesnotwarrantdeeperanalysis.
ThecorrelationbetweenvariablesshowedanegativecorrelationbetweenPriceandPolaritywhichwouldsuggestthatthesentimentofthereviewtendstobemorenegativeasthepricegoesup.Thispotentiallycouldindicatethatcustomersarelesshappywiththeirmoreexpensivephonesandexpectbetterfromtheproduct.PolarityandReviewvoteshaveapositivecorrelationwitheachother,indicatingthatthemoreaReviewis‘up-voted’thehigherthemorepositivethesentimentseemstobe,whichwouldmakesenseformalogicalstandpoint(seeFigure6).
Wecreatedhistogramstocomparereviewlengthsagainsteachratingtohelpusnotethenumberofreviewsandwhichbucketsthereviewlengthbelongtoforeachrating.ThisisrevealsthatthereisanoverwhelmingnumberofreviewsforRating5andaconsiderablenumberofreviewsforRating1.ThiswillbeusefulgoingforwardindeterminingwhichRatingsshouldbemorecloselylookedinto.Figure7isquiteinterestingasisdisplaysthedistributionofpricepointsacrosseachRatingvalue.Asexpected,manypricepointsexistbelowthe$1000rangwithasmallnumberofoutliersupwardsof$2000.
Usingaheatmaptolookatcorrelationsbetweenthevariablesit isnoticeablethattherearenegativecorrelationsof-0.91betweenPriceandreviewlengthwhichwouldindicatethatthegreaterthepricetheshorterthereviewandviceversa.ThiscouldbeattributedtothefactthattherearemorereviewsatthelowerpricelevelasnotedpreviouslyinFigure8andthereforethisfindingitnotatallsurprising.Likeothercharts,itwasfeltthatacloserlookshouldbetakenatReviewlengthvRating.(seeFigure9).WhatisobservedhereisthedistributionofreviewsbyLengthvRatinganditshowsthatthereisaslighttendencyforcustomerstocommentmoreaboutphonesthattheyarehappywithhencethedistributioninFigure10.
Figure 4. Pearson Correlation
International Journal of Data Warehousing and MiningVolume 15 • Issue 4 • October-December 2019
33
TogetasenseforthemultivariatedensityoftheratingsthechartinFigure11showsthehighdensityof%ratingsacrossthereviews.Thereisasmallerbutsignificantpopulationof1Starreviewsandthiswillsupplementthepreviouschartsforfurtherinvestigation.ComparingFigure11withFigure12there isnodistinctdifferencebetweenthe twodistributions indicatingthat there isnosignificantdifferencebetweenbrands.
4.1. Word FrequencyThe word phone is by far the most common. Some other interesting outputs from this are theoccurrencesofthewordsGreatandGoodwhichwouldleadtoapossibleconclusionthattheremaybeanoverwhelmingnumberofpositivereviewsinthedata.Figure13showstheTop7-words.
AsanothervisualalternativeFigure14showsawordcloudofwordfrequencies.Thisgivesanimpactfulvisualoftheoccurrencesofthetop100wordsinthereviewfeedback.
UsingtheTextBloblibrarytheintentionistogiveeach‘Review’asentimentscoreofbetween-1->1.Thesedefinethevariouscategoriesofsentiment.Thereisanoverwhelmingnumberofpositive
Figure 5. Spearman Correlation
Figure 6. Review v Review Length by Ratings
International Journal of Data Warehousing and MiningVolume 15 • Issue 4 • October-December 2019
34
reviewsandthisfollowsfromtheexploratoryanalysisaboveshowingthehighfrequencyof5-starratings.Thisdoesnottellthewholestorysofurtheranalysisonthesentimentisrequired.
TheresultsinFigure15showthatsentimentanalysisresultsmostlyfallintothePositiveReviewcategorybasedon therangeused.NeutralReviewsseems tohavea reasonablevolumewith thenarrowrangetakenintoconsiderationandthereareasmallernumberofNegativereviewscompared
Figure 8. Heat Map
Figure 7. Regression Plot of Price v Rating
International Journal of Data Warehousing and MiningVolume 15 • Issue 4 • October-December 2019
35
totheothercategories.Thiswouldsuggestthatmostcustomerswerereasonablyhappywiththeirpurchases.Interestingly,aRatingof1hadthesecondhighestvolumebutaspertheabovetherearesignificantlylessNegativereviewswhichonthefaceofitsuggeststhatthosewhogaveaRatingof1theircorrespondingsentimentscoremaybehigherandnotdirectlyrelatedtotherating,thisisfornowaquestiontobeexploredfurther.InterestinglytheobservabletrendforRatingscoresof5andPositivereviewsseemfairlyalignedsofurtheranalysisonthisisneeded.
Figure16showsasmallcohortofreviewswiththecorrespondingSentimentandPolarityscores.Thisgivesthereaderaveryquicktasteofoutputfromthecodeshowinghoweachpieceoftextisgivenascore.Averybriefscanoftheresultsshowsthatit‘seems’accurate,takingrecord[2]asapositivereviewi.e.‘VeryPleased’andconverselylookingatrecord[5]withthewordproblemsmentionedthesentimentis-0.3whichseemstofit.Thisisobviouslynotindicativeofthetotalpopulationof
Figure 9. Rating v Review Length Regression plot
Figure 10. Kernel Density Estimate
International Journal of Data Warehousing and MiningVolume 15 • Issue 4 • October-December 2019
36
reviewsbutgivessomeinsightintotheresults.Figure17showsthewordcloudforPositiveReviewsandhereisitnoticeablethatkeywordssuchas‘Great’,‘Love’and‘Good’arequiteprevalentinthedataandthisissomethingthatwouldbeexpected.ReviewingFigure18,thewordcloudfornegativereviewsitisnoticeablethattherearemanyoccurrencesofthewords‘Disappointed’,‘Broken’and‘Bad’whichwouldtendtobeonthenegativesideandagainsomethingthatwouldbeexpectedtoseeintheanalysis.
ThemultinomialNaiveBayesclassifierissuitableforclassificationwithdiscretefeatures(e.g.,wordcountsfortextclassification).Themultinomialdistributionnormallyrequiresintegerfeaturecounts.However, inpractice, fractional counts suchas tf-idf alsowork.TheMultinomialNaïveBayesmodelhada93%accuracyratewhenpredictingaRatingof1anda96%accuracyratewhenpredictingaRatingof5,withanoverallmodelaccuracyof96%Thef1-scoreof90%and97%,with
Figure 12. Apple Kernel Density Estimate
Figure 11. Samsung Kernel Density Estimate
International Journal of Data Warehousing and MiningVolume 15 • Issue 4 • October-December 2019
37
anoverallcoreof96%representtheweightedaverageoftheprecisionandrecallscoresindicatingthatthemodelisaccuratewhenattemptingtopredictifacustomerwillscorethereviewa1or5.
Themodelwastrainedontheentiredatasetincludingall5ratings.Figure19showstheprecisionandrecalloutputfromtheMultinomialNaïveBayesmodellingandFigure20showstheaccuracyandaverages.Itisnoticeablethatwhenthemodelisrunwiththeentirepopulationofthedatatheaccuracydropsquitesignificantlytoanoverallscoreof74%.ThisindicatesthatthedataforRatings
Figure 13. Word Frequency Chart
Figure 14. Word Cloud
International Journal of Data Warehousing and MiningVolume 15 • Issue 4 • October-December 2019
38
Figure 16. Reviews, Sentiment and Polarity Table
Figure 15. Sentiment Frequency Chart
International Journal of Data Warehousing and MiningVolume 15 • Issue 4 • October-December 2019
39
2,3&4hasasignificantimpactonthemodels’accuracy.ItisnoteworthythatRatings1and5remainquitehighandthisispotentiallyduetothelargevolumesofRatingsinthesecategoriescomparedwiththeothercategories.
4.2. Normalisation of PolarityUsing themeanandstandarddeviationof thePolarity_Norm,each ratingwasmatchedwithitscorrespondingnormalisedpolarityscore.Figure21isinterestingformanyreasonsasitcanbeobservedthatacustomer’sratingof5willhaveapproximatelya60%probability that thesentimentscorewillbe4.Secondlythereisanoticeabledistributionbetween3and4whichisinandaroundtheneutralsentimentcategory.Thisslightshiftawayfromapolarityscoreof5suggeststhatwhilemostofsentimentmaybepositivethereisasignificantcohortofreviewsthathaveaslightnegativeorneutraltonewhichwouldindicateanareaforfurtherexplorationfromabusinessperspective.
Figure 17. Positive Review Word Cloud
Figure 18. Negative Review Word Cloud
International Journal of Data Warehousing and MiningVolume 15 • Issue 4 • October-December 2019
40
ThesamemethodwasappliedwhenplottingthegraphinFigure22.ConsideringthatthesearetheprobabilitydistributionsforRatingsof1,onecouldbeforgivenforthinkingtheseweretheresultsforaRatingof3.Itissignificanttoobservethatthepolarityorsentimentof1Ratingsaretendingtowardsslightlynegativeandnatural,thiscouldbedowntomanyreasons.Themodelitselfmaybeunabletoidentifynegativewords,orcustomerstendtonotbeoverlycriticalwhencommenting.
Figure23showsthedistributionsofRatings1,3,4&5togiveanideaofthehowthesentimenthasbeenmeasuredfortheseratings.DuetospaceconstraintsitwasdecidedthatRating2wouldbeleftoutanditalsohadthelowestvolumeofalltheratings.Rating3and4outputgraphsseemtobeconsistentandinlinewiththeexpectationsintermsofthedistributionofpolarity.Forthatreason,the suggestionwouldbe thatno further analysis isneeded for these scoresand fromabusinessperspectivethemostvaluableinsightscomefromtheRatings1and5.
Themodelwasintuitiveenoughtoestablishthatsentimentanalysisisakeyindicatorofthecustomerexperienceandforthatreasonshouldbeseriouslyconsideredaspartofanybusinessesmetrics strategy.Thekeyanalysis andultimately the findingsof this researchcentre around thenormalisationofthepolarityscorefromthesentimentanalysistoalignwiththecorrespondingratinggivenbythecustomerforthatproduct.Itwasnoticeablethatalargedistributionofratingsfellinto
Figure 20. Accuracy and Weighted Average Table
Figure 19. Confusion Matrix (additional variables)
International Journal of Data Warehousing and MiningVolume 15 • Issue 4 • October-December 2019
41
the‘1’and‘5’buckets.Forthatreason,itwasprudenttoexaminetheassociatedpolaritywiththesescoresmorecloselythantheotherratingscores.Theanalysis,supplementedwiththevisualisationsshowedthatforaRatingof5,thesentimentdistributionwasskewedtowardsascoreof4withaslightemphasison3also.Thisshowedthatwhilecustomerswereratingtheproducts5/5theiractualsentimentregardingtheproductwasnotsooverwhelminglypositive.Converselythesamecanbe
Figure 21. Normalised Polarity Distribution for Rating 5
Figure 22. Normalised Polarity Distribution for Rating 1
International Journal of Data Warehousing and MiningVolume 15 • Issue 4 • October-December 2019
42
saidofcustomersgivingtheproductsaratingof‘1’.Theresultsshowedthatwhilecustomersratedsomeproductspoorlytheoverallsentimentfellintothesomewhatnegativeandneutralbands.WhatweshowisthattherecanbeanoverrelianceonusingtheRatingsscoreasasinglepointoftruthforabusiness,whenbigdataanalyticstechniquesareemployeditispossibletodrilldownintotherealfeelingsofcustomersandgetaclearerpictureofproductsandservices.ThepredictiveanalyticsmodelusedmultinomialBayesiannetworkstopredictcustomerratingsusingthesentimentreviewscore.Theresultsoftheresearchhighlighttheshort-comingsoftheratingsscoreasasinglecustomerexperiencemetric,therebysupportingthenotionthatabusinesscannotrelysolelyonthismetric.Theresultfromthepredictionphaseisamodelthatiscapableofcorrectlypredicting74percentofcustomersratingsbasedontheirreviews.Thisisnotremarkablyhighpercentageintermsofaccuracyandbacksuptheclaimabovethattheratingscannotbeusedasanaccuratepredictorofsentimentalone.Onthisbasisoftheresults,thisresearchrecommendsthatbusinessesarediscouragedfromusingtheCSATorratingsscoresasasinglepointoftruthforcustomerexperience.Althoughthesemetricsareimportant,theydonotprovideaninsightintotheopinionofcustomersregardingtheexperiencetheyhavehadwithabusinessoraproduct.Theapproachemployedheredemonstratesthatadata-drivenapproachtoexploringcustomersentimentviatextmininghelpsformaclearerpictureforthebusinessanditisimportantthatfirmscapitalizeonthistypeofdata.
5. CoNCLUSIoN
QuantitativesurveyquestionssuchasCustomerSatisfactionwillalwaysremainwidelypopularwithbusinesseswhenattemptingtotakethepulsefromtheircustomersonproductsandcustomerexperience(Hague&Hague,2018).Tooverlyrelyonsinglecustomermetricssuchascustomerexperienceornetpromoterscoreisariskystrategyandthesuggestionisthatbusinessshouldfocusuponamorenuancedmultidimensionalapproachtopredictingcustomerbehaviour(Zakietal.,2016).
Figure 23. Rating 1,3,4 & 5 Polarity Distribution
International Journal of Data Warehousing and MiningVolume 15 • Issue 4 • October-December 2019
43
Weexploredthenotionoffocussingandadoptingamorescientificapproachtomeasurecustomersentimentbyminingverbatimfeedbackandapplyingasentimentanalysismethodologytodeterminehowcustomersfeelabouttheproductsasopposedtothescoretheygavetheproductbasedontheconstrictive1to5rating.WeprovidedaframeworktounderstandhowacustomerexperienceprocesscanbedevelopedtoprovideinsightsintotheVoiceoftheCustomer.
Keyrecommendationsareasfollows:
• Establishingandmaintainingapositivecustomerexperienceforcustomersisimperativeforanycompany’slong-termgoalsandultimatelytheirsuccessasabusiness.
• Thereneedstobebuy-inataseniormanagementleveltoembraceacustomer-centricmetricsstrategywhichincludesdevelopmentofarobustprocesstomeasurecustomersentiment.
• Textminingandsentimentanalysisplayakeyroleinunderstandinghowcustomersfeelabouttheproductstheypurchaseortheexperiencetheyhadwithacompany(Fan,2014).
• Whilesurvey-basedmetricsgatheringwillalwaysremainanimportantprocessinmeasuringleading indicators of customers’ behaviour, this should be supplemented with a formalcustomerexperienceprograms thatmonitorperformanceandguide improvement efforts(Homburgetal.,2015).
• Organizationsshouldbegintotapintotherichveinofcustomerfeedbackdataattheirdisposalintheformoffreeverbatimtext.
ThecentralemphasisoftheoutcomessuggeststhatitwouldbefollyforabusinesstoignorecustomerfeedbackfromsurveysorsocialmediachannelsandsolelyrelyuponquantitativesuchascustomersatisfactionCSAT.ThereisnodoubtthatCSATisanextremelyimportantmetricandshouldcontinuetobemeasuredhoweverbusinessesneedtocapitaliseoncustomerdatapointssuchassentimenttoreallygetanunderstandingofhowcustomersarefeeling.
Descriptiveanalyticsplaysasignificantroleforabusinessintermsofunderstandingwhatishappeningwiththeirproductsandservices.Itgivesasnapshot,usuallyafterthefactofperformanceandexperienceandthisresearchshowedthatbyvisualisingthesedatapointsandreportingoutonthemcanbevaluabletoabusiness.Theresultsgiveaclearinsightintocustomeropinionandthenuancesaroundwhichcustomersthinkasopposedtohowtheyrateaproductonalimited5-pointscale.Thefindingspointtowardsdeeperlearningaroundsentimentandhowitcanbeutilisedinotherareasofthebusinessandwithadditionaldatapointsnotpresentinthisresearch.Thisstrategycouldbeemployedinabusinessscenarioandtheresultsandfindingswouldprovevaluableinhelpingbuildacustomerexperiencestrategyforabusiness.ThekeyfindingsarethatitisunwisetorelysolelyonCSATorNPSscoreswithoutexaminingcustomerfeedback.
International Journal of Data Warehousing and MiningVolume 15 • Issue 4 • October-December 2019
44
REFERENCES
ACSI.(2018).American Customer Satisfaction Index.Retrievedfromhttp://www.theacsi.org/about-acsi/the-science-of-customer-satisfaction
Ajenstat,F.(2017).TableaufiveyearsaleaderinGartner’sMagicQuadrantforAnalytics.Tableau.Retrievedfromhttps://www.tableau.com/about/blog/2017/2/tableau-five-years-leader-gartners-magic-quadrant-analytics-66133
Allahyari,M.,Pouriyeh,S.,Assefi,M.,Safaei,S.,Trippe,E.D.,Gutierrez,J.B.,&Kochut,K.(2017).Abriefsurveyoftextmining:Classification,clusteringandextractiontechniques.arXiv:1707.02919
Arokiasamy,A.(2013).Theimpactofcustomersatisfactiononcustomerloyalty.Journal of Commerce,5(1),14–21.
Aswani,S.(2017).AnalyzingCustomerFeedbackData:ManualAnalysisvsNLP.Clarabridge.Retrievedfromhttps://www.clarabridge.com/blog/analyzing-customer-feedback-data-manual-analysis-vs-nlp/
Baars,H.,&Kemper,H.G.(2008).Managementsupportwithstructuredandunstructureddata—anintegratedbusinessintelligenceframework.Information Systems Management,25(2),132–148.
Balahur,A.&Montoyo,A.(2008).Afeaturedependentmethodforopinionminingandclassification.
LaValle,S.,Lesser,E.,Shockley,R.,Hopkins,M.S.,&Kruschwitz,N.(2011).BigData,AnalyticsandthePathFromInsightstoValue.MIT Sloan Management Review,52(2),21–32.
Bird,S.,Klein,E.,&Loper,E.(2009).NaturallanguageprocessingwithPython:analyzingtextwiththenaturallanguagetoolkit.InNatural language processing with Python: analyzing text with the natural language toolkit(pp.1–34).O’ReillyMediaInc.
Broan,S.E.&Steger,A.J.(2013).Keyperformanceindicatorsarenotjustaboutprofit.CFMA.Retrievedfromhttp://www.cfma.org/content.cfm?ItemNumber=1899
Brownlee, J. (2016). Support Vector Machines for Machine Learning. Retrieved from https://machinelearningmastery.com/support-vector-machines-for-machine-learning/
Cappelli,A.(2017).Natural Language Processing with Stanford CoreNLP.Retrievedfromhttps://cloudacademy.com/blog/natural-language-processing-stanford-corenlp-2/
Chapman,P.,Clinton,J.,Kerber,R.,Khabaza,T.,Reinartz,T.,Sherer,C.,&Wirth,R.(2000).Crossindustrystandardprocessfordatamining(CRISP-DM)1.0.
Chen,H.,Chiang,R.,&Storey,V.(2012).Businessintelligenceandanalytics:Frombigdatatobigimpact.Management Information Systems Quarterly,1(1),1165–1188.doi:10.2307/41703503
Chetviorkin,I.&Loukachevitch,N.(2013).EvaluatingsentimentanalysissystemsinRussian.
Chowdhury,G.(2005).Natural languageprocessing.Annual Review of Information Science & Technology,37(1),51–89.doi:10.1002/aris.1440370103
Clarabridge.(2015).Retrievedfromwww.clarabridge.com
CranProject.(2018).CRAN Task View: Natural Language Processing.Retrievedfromhttps://cran.r-project.org/web/views/NaturalLanguageProcessing.html
Daume,H.III,&Marcu,D.(2006).Domainadaptationforstatisticalclassifiers.Journal of Artificial Intelligence Research,26,101–126.doi:10.1613/jair.1872
DepartmentofIndustry,AustralianGovernment.(2018).Understandyourcustomers.[REMOVEDHYPERLINKFIELD]Retrievedfromhttps://www.business.gov.au/info/plan-and-start/start-your-business/what-is-customer-service/understand-your-customers
Devika,M.D.C,S.&Ganesha,A.,2016.SentimentAnalysis:AComparativeStudyOnDifferentApproaches.Chennai,TamilNadu,India,DepartmentofCSE,VidyaAcademyofScienceandTechnology.
EMC.(n.d.).[REMOVEDHYPERLINKFIELD]BigDatastorage[whitepaper].Retrievedfromhttps://www.emc.com/collateral/white-papers/idg-bigdata-storage-wp.pdf
International Journal of Data Warehousing and MiningVolume 15 • Issue 4 • October-December 2019
45
Schumaker,R.P.,Zhang,Y.,Huang,C.N.,&Chen,H.(2012).Evaluatingsentimentinfinancialnewsarticles.Decision Support Systems,53(3),458–464.doi:10.1016/j.dss.2012.03.001
Fiedler,L.,Großmaß,T.,Roth,M.,&Vetvik,O.J.(2016).[REMOVEDHYPERLINKFIELD]Whycustomeranalyticsmatter.McKinsey.Retrievedfromhttps://www.mckinsey.com/business-functions/marketing-and-sales/our-insights/why-customer-analytics-matter
FoundationP.S.(2018).Getit.Python.Retrievedfromhttp://www.python.org/getit/
Ghiassi, M., Skinner, J., & Zimbra, D. (2013). Twitter brand sentiment analysis: A hybrid system usingn-gramanalysisanddynamicartificialneuralnetwork.Expert Systems with Applications,40(16),6266–6282.doi:10.1016/j.eswa.2013.05.057
ThinkwithGoogle.(2016).Whycutomeranalyticsarethekeytocreatingvalue.Retrievedfromhttps://www.thinkwithgoogle.com/intl/en-gb/marketing-resources/data-measurement/why-customer-analytics-are-the-key-to-creating-value/
Harris,D.(2018).WhatIsTextAnalytics?WeAnalyzetheJargon.Software advice.Retrievedfromhttps://www.softwareadvice.com/resources/what-is-text-analytics/
Hauke,J.,&Kossowski,T.(2011).ComparisonofvaluesofPearson’sandSpearman’scorrelationcoefficientsonthesamesetsofdata.Quaestiones Geographicae,30(2),87–93.doi:10.2478/v10117-011-0021-1
Hollyoake, M. (2009). The four pillars: Developing a ‘bonded’ business-to-business customer experience.Journal of Database Marketing & Customer Strategy Management,16(2),132–158.doi:10.1057/dbm.2009.14
Hosseini,S.,Maleki,A.,&Gholamian,M.(2010).ClusteranalysisusingdataminingapproachtodevelopCRMmethodologytoassessthecustomerloyalty.Expert Systems with Applications,37(7),5259–5264.doi:10.1016/j.eswa.2009.12.070
Huang,O.(2017).Applying Multinomial Naive Bayes to NLP Problems: A Practical Explanation.Medium.Retrievedfromhttps://medium.com/@Synced/applying-multinomial-naive-bayes-to-nlp-problems-a-practical-explanation-4f5271768ebf
TelusInternational.(2017).CustomerFirstMagazine.Retrievedfromhttps://www.telusinternational.com/media/TI-CustomersFirstMagazine-I04.pdf
Kaisler, S. (2013). Big Data: Issues and Challenges Moving Forward. In Proceedings of the 46th Hawaii International Conference on System Sciences(pp.995-1004).doi:10.1109/HICSS.2013.645
Kim,E.(2017).Everything You Wanted to Know about the Kernel Trick.Retrievedfromhttp://www.eric-kim.net/eric-kim-net/posts/1/kernel_trick.html
Liu,B.,Hu,M.,&Cheng,J.(2005,May).Opinionobserver:analyzingandcomparingopinionsontheweb.InProceedings of the 14th international conference on World Wide Web(pp.342-351).ACM.
Loria,S.(2018).TextBlob.Retrievedfromhttp://textblob.readthedocs.io/en/dev/
Maiolo,A.(2015).Comparingn-grammodels.Recognize Speech.Retrievedfromhttp://recognize-speech.com/language-model/n-gram-model/comparison
Marbán,Ó.,Mariscal,G.,&Segovia,J.(2009).Adatamining&knowledgediscoveryprocessmodel.InData mining and knowledge discovery in real life applications.IntechOpen.
Maritz.(2018).Maximiseverbatimswithtextanalytics[whitepaper].Retrievedfromhttps://www.maritzcx.com/blog/wp-content/uploads/2014/11/Maritz-White-Paper-Maximise-verbatims-with-text-analytics.pdf
McCallumzy, A. K., & Nigamy, K. (1998, July). Employing EM and pool-based active learning for textclassification.InProc. International Conference on Machine Learning (ICML)(pp.359-367).
Medhat,W.,Hassan,A.,&Korashy,H.(2014).Sentimentanalysisalgorithmsandapplications:Asurvey.Ain Shams Engineering Journal,5(4),1093–1113.doi:10.1016/j.asej.2014.04.011
Mejova,Y.A.(2012).Sentiment analysis within and across social media.UniversityofIowa.doi:10.17077/etd.zze4b252
International Journal of Data Warehousing and MiningVolume 15 • Issue 4 • October-December 2019
46
Miner,G.(2012).Practical text mining and statistical analysis for non-structured text data applications.Oxford:AcademicPress.
Molag,T.(2017).PowerBIvsTableau.Encore Business.Retrievedfromhttps://www.encorebusiness.com/blog/power-bi-vs-tableau/
Natarajan,K.,Li,J.,&Koronios,A.(2010).Dataminingtechniquesfordatacleaning.InEngineering Asset Lifecycle Management(pp.796–804).SpringerLondon.
NTLK.org.(2017).Retrievedfromhttp://www.nltk.org/
Ordenes, F., Theodoulidis, B., Burton, J., Gruber, T., & Zaki, M. (2014). Analyzing customer experiencefeedback using text mining: A linguistics-based approach. Journal of Service Research, 17(3), 278–295.doi:10.1177/1094670514524625
Ott, T. (2018). What is text analytics. Software Advice. Retrieved from https://www.softwareadvice.com/resources/what-is-text-analytics/
Pang,B.,&Lee,L.(2002).Thumbsup?SentimentClassificationusing.Machine Learning.
Parasuraman,A.,Berry,L.L.,&Zeithaml,V.A.(1991).UnderstandingCustomerExpectationsofService.Sloan Management Review,32(3),39.
Project,C.(n.d.).Cran r-project.Retrievedfromhttps://cran.r-project.org/
Qualtrics.(2018).What is the Net Promoter Score (NPS).Retrievedfromhttps://www.qualtrics.com/experience-management/customer/net-promoter-score/
Rapidminer.(2018).Rapidminer.Retrievedfromhttps://rapidminer.com/
ŘehůřekR.(2018).gensim.Retrievedfromhttps://radimrehurek.com/gensim/about.html
SAS. (2017). Natural Language Processing. Retrieved from https://www.sas.com/en_us/insights/analytics/what-is-natural-language-processing-nlp.html
Smith,E.(2016).Everythingyouneedtoknowaboutcustomersuccess.Cobloom.Retrievedfromhttps://www.cobloom.com/blog/everything-you-need-to-know-about-customer-success
Spacy.io.(n.d.).Industrial-Strength NLP.Retrievedfromhttps://spacy.io/
Srivastava,A.,&Singh,D.M.(2014).SupervisedSAofproductreviewsusingWeightedk-NNAlgorithm.In2014 11th International Conference on Information Technology.
Srivastava,T. (2014).Supportvectormachinesimplified.Retrievedfromhttps://www.analyticsvidhya.com/blog/2014/10/support-vector-machine-simplified/
Tamilselvi,A.&ParveenTaj,M.(2013).SentimentAnalysisofMicroblogsusingOpinionMiningClassificationAlgorithm.International Journal of Science and Research,2(10),2319–7064.
Taylor,C.(2018).[REMOVEDHYPERLINKFIELD]Structuredvsunstructureddata.Datamation.Retrievedfromhttps://www.datamation.com/big-data/structured-vs-unstructured-data.html
Tirunillai,S.,&Tellis,G.(2014).Miningmarketingmeaningfromonlinechatter:Strategicbrandanalysisofbigdatausinglatentdirichletallocation.JMR, Journal of Marketing Research,51(4),463–479.doi:10.1509/jmr.12.0106
Upadhyay,P.(2017).RemovingstopwordsNLTKpython.Geeksforgeeks.com.Retrievedfromhttps://www.geeksforgeeks.org/removing-stop-words-nltk-python/
Verint.(2016).Verint Text Analytics.Retrievedfromhttps://www.verint.com/Assets/resources/resource-types/datasheets/text-analytics-datasheet.pdf
Vryniotis,V.(2013).MachineLearningTutorial:TheMaxEntropyTextClassifier.Datumbox.Retrievedfromhttp://blog.datumbox.com/machine-learning-tutorial-the-max-entropy-text-classifier/
International Journal of Data Warehousing and MiningVolume 15 • Issue 4 • October-December 2019
47
Kevin Curran is a Professor of Cyber Security, Executive Co-Director of the Legal innovation Centre and group leader of the Ambient Intelligence & Virtual Worlds Research Group at Ulster University. He is also a senior member of the IEEE. Prof. Curran is perhaps most well-known for his work on location positioning within indoor environments, blockchain and Internet security evidenced by over 800 published works. His expertise has been acknowledged by invitations to present his work at international conferences, overseas universities and research laboratories. He is a regular contributor on TV and radio and in trade and consumer IT magazines.
Wichmann,M.(2017).The Python Wiki.Retrievedfromhttps://wiki.python.org/moin/
Xia, R., Zong, C., & Li, S. (2011). Ensemble of feature sets and classification algorithms for sentimentclassification.Information Sciences,181(6),1138–1152.doi:10.1016/j.ins.2010.11.023
Yadav,V.,&Elchuri,H.(2013).Serendio:SimpleandPracticallexiconbasedapproachtoSentimentAnalysis.InProceedings of theSecond Joint Conference on Lexical and Computational Semantics(pp.543-548).
ENDNoTE
1 https://www.kaggle.com