The Application of Sentiment Analysis and Text Analytics ... ·...

DOI: 10.4018/IJDWM.2019100102

International Journal of Data Warehousing and MiningVolume 15 • Issue 4 • October-December 2019

Copyright©2019,IGIGlobal.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIGIGlobalisprohibited.

21

The Application of Sentiment Analysis and Text Analytics to Customer Experience Reviews to Understand What Customers Are Really SayingConor Gallagher, Letterkenny Institute of Technology, Donegal, Ireland

Eoghan Furey, Letterkenny Institute of Technology, Donegal, Ireland

Kevin Curran, Ulster University, Derry, UK

ABSTRACT

In a world of ever-growing customer data, businesses are required to have a clear line ofsight into what their customers think about the business, its products, people and how ittreats them.Insight into thesecriticalareasforabusinesswillaid in thedevelopmentofarobustcustomerexperiencestrategyandinturndriveloyaltyandrecommendationstoothersbytheircustomers.It iskeyforbusiness toaccessandminetheircustomerdata todriveamoderncustomerexperience.Thisarticleinvestigatestheuseofatextminingapproachtoaidsentimentanalysisinthepursuitofunderstandingwhatcustomersaresayingaboutproducts,servicesandinteractionswithabusiness.ThisiscommonlyknownasVoiceoftheCustomer(VOC)dataanditiskeytounlockingcustomersentiment.Theauthorsanalysetherelationshipbetweenunstructuredcustomersentimentintheformofverbatimfeedbackandstructureddataintheformofuserreviewratingsorsatisfactionratingstoexplorethequestionofwhethercustomerssaywhattheyreallythinkwhengiventheopportunitytoprovidefreetextfeedbackas opposed to how they rate a product on a scale of one to five. Using various SentimentAnalysisapproaches,theauthorsassignasentimentscoretoapieceofverbatimfeedbackandthencategoriseitaspositive,negative,orneutral.Usingthisnormalisedsentimentscore,theycompareittothecorrespondingratingscoreandinvestigatethepotentialbusinessinsights.Theresultsobtainedindicatethatabusinesscannotrelysolelyonastandalonesinglemetricasasourceoftruthregardingcustomerexperience.Thereisasignificantdifferencebetweenthecustomer ratingsscoreand thesentimentof theircorrespondingreviewof theproduct.Theauthorsproposethatitisimperativethatabusinesssupplementstheircustomerfeedbackscoreswitharobustsentimentanalysisstrategy.

KEyWoRDSMachine Learning, Sentiment Analysis, Text Analysis, Text Mining


22

1. INTRoDUCTIoN

Increasingly,leadingmodernbusinessesarelookingtogainmoreinsightsfromcustomerverbatimdatatheycollect.Unfilteredcustomerfeedbackprovidesatremendousopportunitytolearnmoreaboutcustomersentimentinrelationtoproductsandtheirend-to-endexperiencewithacompany.Italsogivesthebusinessanopportunitytounderstandhowtheycan‘close-the-loop’onanypoorfeedbacktheymayreceiveandconverselyhowtheycancapitaliseonpositivefeedbackfromcustomers.Oneofthedrawbacksofonlinecustomerverbatimisitsqualityandquantityofdata(Maritz,2018).Howdobusinessesmineandanalysedatatoensureitrevealskeyinsightsthatcanultimatelydriveprofitsintherightdirection.Onceabusinessrecognisesthebenefitofanalysingitscustomerfeedbackdata,thequestionbecomes,how?Identifyingcustomerverbatimasunstructureddataisthefirststepandsecondlyusinga‘BigData’approachsuchastextminingwillallowthebusinesstoemployamoresophisticatedandadvancedmodellingtechniquetouncoverpatternsinthedata,employingsentimentanalysistoidentifykeythemesfromthefeedback.However,businesseswillcontinuetousestructuredquantitativedatagatheringtechniquessuchasaskingcustomerstoratecustomer/clientinteractions,productsandemployeesonvariousscalesi.e.1-5Stars,0–11-pointscalesandsoon.Thesemethodscannotbesolelyreplieduponanditiskeyforabusinesstocriticallyanalyseitsunstructureddatawhetherthanbeonsocialmediachannels,ad-hocemailstothebusiness,lettersandcustomersurveys.

Allbusinessesneedcustomerstoprosperandgrow.Theyarethemainsourceofrevenueformostbusinesses.Itcanbearguedthatthesuccessofabusinessisdirectlyproportionaltoitsabilitytoacquirecustomers(Smith,2016),keepcustomershappy,identifyissuesorirritantsandconsequentlydrivemoresellingopportunities.Butforabusinesstoachievethisitneedstoidentifythekeyindicatorsthatwillprovideinsightstofacilitatethis.Therearemanyvariablesthatabusinessneedstoidentify,thewho,what,why,andhow.Whoarethepotentialcustomersinthemarketplace?Whatdotheywant?Whydotheywantaproduct?Howasabusinesscantheyretaincustomersandgrowtheirmarketshare.Examplesofwhereabusinesscanutilisetheircustomerdataaresalesdemographics,productandchannelproductpreferences,socialmediaorwebsitesentimentandinteractionsandtransactionbehaviours. Customer analytics is becoming one of the key enablers available to companies tofacilitatethetranslationofrawdataintousefulinsightsabouttheircustomers.Thisresearchintendstohighlighttheimportanceofcustomeranalyticsinthedeliveryofcustomerinsightsbacktothebusinesstodrivedecisionmaking(Fiedler,etal.,2016).60%ofcompaniessaidthatorganisationalsiloswereamajorobstacletoimprovingcustomerexperience(Google,2016).Companiesarefindingitdifficulttounderstandwhatcustomerdatatheyhaveandoftenareunderutilisingthecustomerdatathattheydohave(DepartmentofIndustry,2018).ThemostsuccessfulcompanieshavefoundwaysaroundthisandareactivelyimplementingCustomerExperiencestrategiesi.e.buildingCustomerExperienceteamsthatareagileandcanpivotbasedonbusinessneedsandgoals(Hollyoake,2009).Whileanycompanycanusedatatoreportfinancialsordrivecostsavings,thekeydifferentiationcomesfromhowthisdatacanbeusedtodrivebusinessinsights.Whilebusinessesarequitegoodattrackingstructedfeedbacki.e.numericalratingssuchas1-5-starratingstheyneedtoimproveonunderstandingwhatcustomersaresayingabouttheirproductsandexperienceswiththecompany.Thisposesaseriouschallengeanddifficultyincreasesasabusinessgrowsitscustomerbase(Parasuraman,etal.,1991).Withthatinminditseemsimpossibleforabusinesstocontacteverypotentialandcurrentcustomerindividuallytounderstandtheirsentiment.Therearemanymethodsorchannelsinwhichcustomerscaninteractwithabusiness,forexample,socialmedia,email,customersatisfactionsurveys, registeringsatisfactionvia IVRs,capturing feedbackoncallsviaSpeechAnalytics.So,thechallengeforthebusinessishowtoaccuratelycapture,store,analyseandobtaininsightsfromthisdata.Thisbecomesmoredifficultasthecustomerbasegrows,anditseemsveryunlikelythatabusinesswillcreateatouchpointwitheverysinglecustomertheyhave.

Intoday’shigh-techmarketplace,customershavemoreoptionsforproductsandservicesthaneverbefore(International,2017).Thebarriersofchoicehavecomedownastechnologyandtheinternet


23

hasadvancedandthisleavesthebusinesswithaconundrumofhowtoaccuratelypredictwhattheircustomerswilldointhefuturebeforetheircompetitors.Toachievethis,theyneedtopredictcustomerbehaviour.Thisiswherepredictiveanalyticsbecomestheanswer.Togetaheadofthecurveandultimatelytheircompetitors(LaValleetal.,2011)arguethatabusinessneedstoproactivelyemployadvanceddata analytics techniquesanddrivea strategy thatwill provide insight intowhat theircustomersmightdointhefuture.Whatproductsarecustomersbuying?Whatisthelikelihoodthatacustomerwillchurnandmovetoacompetitor?Arecustomerssatisfiedwiththeproducts,ifnotwhy?Arethereirritantsthatcustomersfindwhendealingwiththebusiness?Thereareamultitudeofquestionsthebusinesswillhave,andpredictiveanalyticscangosomewayinprovidinginsights.Usingadvanceddataanalyticstechniques,abusinesscanusetheircustomerdatatobuildpredictivemodels.Thesemodelshavethepotential topredictfuturecustomerbehaviour.Itaidsbusinessestoidentifycustomersatriskofleaving,potentialissueswithproductsorservices.Withpredictiveanalytics,businessescangainacompetitiveadvantageoverrivalsifusedproperly.Thereisatrendtowardsprescriptiveanalyticsandamoveawayfromdescriptiveanalytics(S.Kaisler,2013).Thisallowsthebusinesstofocusonwhatcouldhappen,basedonpredictivemodellingmethodsratherthanwhathasalreadyhappened.Thereisobviouslyarequirementfordescriptiveanalytics,businesseswillalwayshavetoreportwhathashappenedatacertainpointoftimehencetheneedtocapturemetricsandKPIsbutthenotionofutilizingthisdatatopredictwhatwillhappenshowsaforward-thinkingmindset(Steger,2013).Theopportunityforexploringpredictivecustomeranalyticstodayisbetterthaneverbefore.Datasourcessuchascustomersatisfactionsurveys,socialmediachannels,chatbots,voicecallsusingspeechanalyticsallprovidearichsourceofopportunity.AndnaturallyBigDatatechnologyhaskeptpacewiththeexponentialgrowthindataandiscapableofhandlinglarge-scaledataprocessing,storageandintegrationneededtocapitalizeontheseuncoveredinsights(EMC,n.d.)Tobegintounderstandwhatthecustomersthinkandfeelabusinessneedstounderstandthetypeofdataithasandbuildthetechnologicalinfrastructuretobeabletominethedatatobuildeffectivemodels(McKinsey,2016).Onlyatthispointcanthebusinessdevelopthesemodelseffectivelyanddrivetheiranalyticsstrategyforward

WeexamineifitispossibleforabusinesstorelysolelyonasinglepointCustomerSatisfactionmetricthroughthevalueofsentimentcontainedwithincustomerfeedbackasanindicatorofcustomersatisfactionand/oradvocacywithacompanyorproduct.Essentially,weinvestigateifacustomersentimentprovidesatruerreflectionofthecustomersexperiencewithaserviceorproduct.

2. RELATED WoRK

Textanalyticsprovidesasetoftechniquestomaketrendsvisible,dynamic,andtorevealrelationshipsbetweendatapointsthatmayhavebeendifficulttouncoverbyahumanduetothescaleandquantityofthedata(Miner,2012).Numericratingsremainthemostgenericformofdatacollectionincustomersurveysandunstructuredfeedbacksuchascustomerverbatimcontinuestoincreaseinimportance.Ratherthanattemptingtomanuallyreviewreamsofcustomercomments,toolsliketextanalyticsandspeechanalyticsallowcompaniestoturnthisverbatimfeedbackintovaluableinsightstheycananalyseandmeasuremoreefficientlythanmanuallyreviewingthedata(Verint,2016).TheendtoendTextminingprocesshasmanystagesallofwhicharekeytotheendgoalofderivinginsightsfromthetextualdata.Figure1givesanideaofhowthiswouldlookasaprocessflow.

Itiskeythatthedataneedstobeinaprocessableformat.Multipledatasourcescanbeused,alargeamountoftextualdatamaybeavailableviafreetextformatonsocialmediaplatforms,customersurveysandspeechanalyticstoolswouldalsoconvertconversationsintoarawformat.Preliminaryformattingispreparingthedatabyremovingdirtyorerroneousdata.Thisimprovesthedataandlessenthechanceoferrorsalthoughcanbetimeconsuming(NatarajanK.,2010).Otherformattingprocedurescanbeapplied to thedataas required.Textualdatacontains lotsof smallyethighlyfrequently-usedwords:“at,”“the,”“or,”“for”etc.Toanalysethefrequencyofwordsinatextblock,


24

thesetypesofwordswillalwaysoccur.Stopwordremovalistheprocessoffilteringoutthesewordstolookforlonger,moresignificantwords.Pythonforexamplecomeswithbuilt-instopworddictionariesfordifferentlanguages(Upadhyay,2017).“Tokenizationlooksatthespacesbetweenthewordsaswordboundariesto‘cut’wordsfromthetextstringatthosepoints.”(Ott,2018).Tokenizationisacrucialstepintheprocessofstructuringtextforfurtheranalysis,asotherwisetextistreatedasanunstructured“string.”Theideaistobreakthestringintoindividualwordsortokenize,thesentenceorpieceoftext.Thisallowsforthedeterminationofwordfrequencyinthedocumentandbeginidentifyingrelationshipsbetweenwords.Theintentionforabusinessforexamplewouldbetoensurethatfrequently-usedwordsaretokenizedproperlywithaviewtofurtheranalysis.Mostoftext-miningtoolsallowuserstocreatecustomizedtokenizationrules.TheBag-of-wordsprocess,alsoknownasavectorspacemodel,breaksdownadocumentintoitsconstituentwords,regardlessofgrammarandorder.Thisisacrucialstepasthereisthelikelihoodofreceivingadditionaltextualdatawhichrequirebebrokendowninsimilarfashiontoallowforfurtheranalysis(RobertP.Schumaker,2012).Stemming(a.k.a.“lemmatization”)istheprocessbywhichwordsarereducedinadocumenttotheirbasicform.TakingtheEnglishlanguageasanexample,wordshavemanyformsratherthanonesingleformbasedontheirgrammaticalpurpose.Forinstance,“was,”“are,”“is,”etc.,areallformsoftheverb“tobe.”AswithTokenizationabove,onceyouperformaprocesslikestemmingthenthedataispreparedforfurtheranalyticswhichcanbeappliedtotextualdatatorecognizepatterns(Mejova,2012).OneexampleofanadvancedanalyticstechniqueassociatedwiththetextminingprocessisClusteranalysis.Inthecontextofthetextminingprocess,clusteranalysisinvolvesclassifyingwordsintogroups,suchthatwordsineachgrouparemorerelatedtooneanotherthantheyaretowordsinothergroups.(Kochut,etal.,2017)Thisisdonethroughtheapplicationofalgorithmssuchask-meansandx-meansclustering.UsingK-meansclusteringthevariable‘K’representsthenumberofclustersyouwant.Youcouldpotentiallysegmentdataintofourgroups,eightgroups—thenthealgorithmunderstandsthatandexecutesbasedoffthat‘K’number.Forx-meansthenumberofoptimalclustersisunknownandhereyouwouldspecifyamaxandminnumberofclusters.Thealgorithmwillthenrunananalysisandoutputtheoptimalnumberofclusters.Clusteringisapowerfulwaytoidentifyrelationshipsandtrendsincopiousamountsoftextualdata(Allahyari,2017).Itcanalsobeusedaspartofthesentimentanalysisprocess.DataVisualizationplaysakeyroleinallowingabusinessorresearchertoidentifykeytrendsorinsightsfromthedata.Therearemultiplemethodstovisualisedata,onecouldusetheinbuiltfeaturesofaprogramminglanguagee.g.Pythonhasmatplotlibforcreatingchartsadvisualsfromdata.ThereareavastrangeofDataVisualisationsoftwareavailablefromaBusinessintelligencepointofview,i.e.TableauisanindustryleaderinDataVisualisation(Ajenstat,2017),Microsoft’sofferingofPowerBIisanupandcomingchallengertoTableau(Molag,2017).Thesebusinessintelligencetoolscanlinktomultipledatasourcesi.e.excel,csv,SQLserver,jsonandinterestinglytheynowcanlinkdirectlytoPythonorRscripts,sothatuserscanrunscripts

Figure 1. End to End Text Mining Process (Harris, 2018)


25

insidethetoolandthencreatevisualsandultimatelydashboardstoaidintheinsightexplorationofthedata.ThisisextremelybeneficialinabusinessscenariowherecompaniescanhavededicatedDataVisualizationanalystswhocreateandmaintainthesetypesofdashboards.Aspartofthisresearchtheauthorplanstoutiliseandpromotethebenefitsofvisualisingthedatasothatthereadercanunderstandhowthiswouldplayoutinabusinessenvironment.

Textminingcanbeutilizedtoevaluatecustomersatisfactionfromverbatimfeedbackratherthanrelyingsolelyonanumericalindicatorfromacustomersatisfactionsurvey-based(Hosseini,etal.,2010)proposedadata-miningmodelusingmonetary(RFM)attributesandK-meansclusteringtoevaluatecustomerloyaltyandcustomerlifetimevalue(CLV).Theresearchinvestigatedthedegreetowhichcustomerloyaltyinfluencedbottomlineprofitsforanorganization.Themodeldevelopedshowedaclearimprovementintheaccuracyofthecustomerloyaltymeasurement.Thisstudyhadlimitationsasthemodelreliedtooheavilyoncombiningdata-miningtechniquesaloneanddidnotconsider the impactof textualdata suchasverbatimcomments fromcustomers,which arenowgeneratedatmanytouchpointsinthecustomer’sjourney.(Pang&Lee,2002)employedtextminingmethodstoextractinsightsfromcustomerdatausingsentimentanalysistotrytoimprovecustomerloyaltymeasurement.Anotherexampleofusingtextanalyticstounderstandthedimensionsofproductalignmenttogaininsightintobrandpositioningwasusedby(Tirunillai&Tellis,2014).(Ordenes,etal.,2014)foundthattextanalyticscouldbeusedtoanalysecustomerfeedbacktocaptureinsightsofthecustomerexperience.Thismodelcapturedcustomersentimentfromthedataandtaggeditaseithercomplaints,complimentsorsuggestions/feedback.Thisallowedthemtoproposeinsightstosuggestefficienciesforthecompanyanditsresources.Theirresearch,howeverdidnotcombinethequalitativedataandquantitativedatatoassessandpredictcustomerloyalty.Wheretheresearchprovedusefulwasinrelationtothecomputationmethodsappliedthatcouldproducefindingsthatcanbeexpressedquantitatively,andthisforexampleaidsthedesiredoutcometoconductstatisticalmodelling,datavisualization,andmachinelearningalgorithmstoderivefurtherinsights.

Businesseshave twokindsof customer feedbackdata that they store, analyse andmeasure:structuredandunstructureddata.Structureddataisinformationthatisclearlydefinedandeasytoreporton.Itisthekindofdatathatisgenerallyfoundinasurveyandcanbeorganizedinaspreadsheet:name,location,age,andrating(3outof5stars,forexample,ora10for“mostsatisfied”versusa1for“leastsatisfied”)(Taylor,2018).UnstructureddataasitexiststodayinthecontextofCustomerexperiencedatais,basically,textorfreeformcustomerverbatim,althoughitcanalsoincludeothermediasuchasaudio,photos,orvideos.Unstructureddatacanbecapturedinanemail,the“additionalcomments”sectionofasurvey,voicerecordingsofcustomerinteractionse.g.SpeechAnalyticstools,apostonacustomerreviewsitee.g.Amazon,insocialmedia,andinamultitudeofotherplaces.(Taylor,2018).Analysingallthisdatacorrectlyiscritical,becauseitrevealseverythingfrombuyingtrendstoproductflawsandprovidesasignificantbusinessadvantage.(Aswani,2017)Organizationsoftenstruggletodothisanalysis,however,becauseunstructureddataissignificantlyhardertocategorizeandreportonthanstructureddata.Itcanbehardtoparseduetogrammaticalerrorsorslang, itfrequentlycontainsmultipleunrelatedideas,anditcanrepresentvariouslevelsofsentimentrelatedtoeachidea(e.g.“Icouldn’tcompletetheregistrationprocessonline,Ididn’tunderstandthewebsite,buttherepwasverypatientandtalkedmethroughtheprocess.”).Unstructureddatais,byitsnaturalhardertonavigatethanstructureddata.Anecdotallyitisthoughtthat95%ofcustomerfeedbackdataisunstructured(Kemper,2008)anditcontainsawealthofinformationtoaidintheunderstandingofcustomersentimentandopinion.Dauntingasitmayseemtobusinesses,thistypeoffeedbackispredictedtocontinuetogrow,asmillionsofpeopleincreasetheiruseofonline(Clarabridge,2015).Customersatisfactionisoneofthemostcriticalissuesconcerningbusinessorganizationofalltypes,whichisjustifiedbythecustomer-orientedphilosophyandtheprinciplesofcontinuesimprovementinmodernenterprise(Arokiasamy,2013)seeFigure2.

NetPromoterScore(NPS)isameasureofcustomerloyaltyandisoftenheldastheleadingcustomerexperiencemetric.NPS is still apopularcustomer loyaltymeasurementdespite recent


26

studiesarguingthatcustomerloyaltyismultidimensional.Therefore,firmsrequirenewdata-drivenmethodsthatcombinebehaviouralandattitudinaldatasources.Measuredona0-10scale,itisbrokendownwhereascoreof0–6isclassifiedasDetractors,unhappycustomershowareveryunlikelytorecommendthebusinesstofriendsorfamily.Ascoreof7–8isclassifiedasPassives,thosecustomerswhoarehappyenoughwiththeservicebutnotenoughtoactivelyencourageothersandascoreof9–10isclassifiedasPromoters,thosecustomerswhoareloyalandwouldrecommendthebusinessorproducttofriendsandfamily(Qualtrics,2018).

Sentiment analysis plays a key role in the development of predictive and machine learningmodels.Ithasdevelopedintoawiderall-encompassingprocessthatinvolvesanalysingtextualdataandapplyinga‘sentimentscore’tothattext(Mejova,2012)Oneofthemostwidelyusedapproachesisusingmachinelearningstrategiestodevelopandbuildanalgorithmusingatrainingdatasetbeforeapplyingatestor‘real’dataset.Machinelearningtechniquesfirsttrainsthealgorithmwithsomeinputswithknownoutputssothatlateritcanworkwithnewunknowndata(Devika,etal.,2016).SVMorSupportVectorMachinesaredefinedasasupervisedmachinelearningalgorithmnormallyusedforregressionandclassificationprocesses.SVMusesanon-probabilisticclassifierinwhichalargeamountoftrainingsetisrequired.Itisdonebyclassifyingpointsandidentifyinga(d-1)-dimensionalhyperplane.SVMlooksforthehyperplanethatbestsegregatestheclasses(Devika,etal.,2016).SupportVectorMachinesmakeuseoftheconceptofdecisionplanesthatdefinedecisionboundaries.Adecisionplaneisonethatbestdifferentiatesbetweenasetofobjectshavingdifferentclassmembership.SupportVectorMachines are considered apowerful classification algorithm.Whenusedinconjunctionwithothertechniquessuchasrandomforestandothermachinelearningtools,theygiveaverydifferentaspecttoensemblemodels(Srivasta,2014).ThereisanargumentthatSVMdonotperformwellwithlargerdatasetsduetothehightrainingtimerequired,theydohoweverperformwellwithclearmarginofseparationsbetweenobjects(Brownlee,2016).

Inthecontextofsentimentanalysisortextmining,ann-gramisaneighbouringsequenceofnitemsfromagivensequenceoftextorspeech.NItemscanbeletters,words,pairsofwordsforexampleandaretypicallycollectedfromasequenceoftextorspeechcorpus.Figure3showshowasentencecanbebrokenupintovariousformatsi.e.words,pairsofwords,etc.N-gramsentimentanalysisconsidersthesentenceasawhole(Ghiassietal.,2013).

InMachinelearning,NaïveBayesistermedasafamilyof“probabilisticclassifiers”basedontheuseofBayesTheorem.NaïveBayesclassifiersarehighlyscalableandaremainlyusedwhenthesizeofthetrainingsetisless.TheconditionalprobabilitythataneventXoccursgiventheevidenceYisdeterminedbyBayesruleshowninEquation(1).UnlikeNaïveBayes,MaximumEntropyClassifierdoesnotassumethatthefeaturesareconditionallyindependentofeachother.Uncertaintyismaximumforauniformdistributionandthemeasureofuncertaintyisknownasentropy.MaximumEntropyClassifiercanbeused tosolveavarietyof textminingscenariosparticularlysentimentanalysis

Figure 2. Basic American Customer Satisfaction Index model (ACSI, 2018)


27

(Vryniotis,2013).MaximumEntropyClassifierusesaparameterizedsetofweightswhichwhencombined,jointhefeaturesthataregeneratedfromasetoffeaturesbyanencoding.Thisencodingthenmapseachpairoffeaturesetandlabelsthemtoavector(McCallumzy&Nigamy,1998).MEclassifiersareknownastheexponentialorlog-linearclassifiersandbecausetheyworkbyextractingsetsoffeaturesfromtheinputdata,combiningtheminalinearfashionandthenusingthissumastheexponent(DaumeIII&Marcu,2006).(Pang&Lee,2002)usedtheapproachthateachdocumentisrepresentedwithanarrayof1’sand0’sthatindicateifawordexistsinthetextofthedocument.ThisplaysontheBoW(BagofWords)theorycommonlyusedinNLPandtextmining.

P c xP x c P c

Posterior obability

Likelihood Class i

||

( ) = ( ) ( )↑

↓ ↓

Pr

Pr oor obability

edictor ior obability

P x

Pr

Pr Pr Pr↑

( )

P c x P x c P x c P x c P cn

| | | ... |( ) = ( )× ( )× × ( )× ( )1 2 (1)

TheK-NearestNeighbour(KNN)techniqueisbasedonthenotionthattheclassificationofaninstancewillbeinsomewaylikethosenearbyitinthevectorspacehencethetermnearestneighbour.(Srivastav&Singh,2014)researchedweightedk-NearestNeighbourwheretheyassumedweightagetothoseobjectsinthetrainingsetandthenusedtheseweightsfortheircalculationofsentimentoftextinwordbywordfashion.Inaweightedk-NNanalysis(Srivastav&Singh,2014)thesentencesofthetweetscapturedweretokenisedandthenstopwordswereremoved.Thealgorithmdevelopedapositivescoretoeachreviewafterthefirstparse.Forthesecondparsingprocessaneutralreviewisinput.ThisscoreisthenmodifiedifrequiredallowingforbetterpositivisedeterminationandcreatinganoutputfilethatconsistsofthereviewIDanditsassociatedpositivescore.Weightedcombinationsoffeaturesetscanbequiteeffectiveinthetaskofsentimentclassification,sincetheweightsoftheensemblerepresenttherelevanceofthedifferentfeaturesets(e.g.n-grams,POS,etc.)tosentimentclassification,insteadofassigningrelevancetoeachfeatureindividually(Xiaetal.,2011).(Balahur&Montoyo,2008) investigated thenotionof assigningproducts classes to textualdata sets andextractinggeneralfeaturesfromtheproductsandtheninturnassigningsentimenttothefeatures.

ARulebasedapproach(Liu,etal.,2005)toextractopinionfromatextualdatasetbydefiningvariousrulestokenizedeachsentenceineverydocumentofthedatasetandtestedforthepresenceofeachtokenorword.Ifthepresenceofthewordisconfirmedandhasapositivesentiment,thena

Figure 3. N-Gram example (Maiolo, 2015)


28

“+1”ratingisappliedtoit.Eachpieceoftextstartswithaneutralscoreofzeroandwasconsideredpositive.In(Devika,etal.,2016)forexample,iftheinputsentencecontainsanywordwhichisnotpresentinthedatabasethenitwillbeaddedtothedatabase.Thisisasupervisedlearningapproachinwhich thesystemis trainedto learn ifanynewinput isgiven.Rule-basedapproaches tendtouseheuristics todetermine sentiments and linguistics research to analyse sentiments.They tendtohaveverypoorgeneralization,butwithinanarrowdomaintheytendtoperformwell(Medhatetal.,2014).Inmachinelearningthisisknownasoverfittingandtheaimistoavoidthisproblemandconverselywithrules-basedsystemsit’sconsideredlargelyunavoidable.(Devikaetal.,2016)employedamachinelearningdatadrivenapproachwhichusedalabelledcorpusoftextswiththeirrespectivesentimentsforprediction.

ALexiconbasedapproachusesdictionariesofwordstaggedwiththeirsemanticpolarityandsentimentstrength(Yadav&Elchuri,2013)Thesescoresarethenusedtocalculatethepolarityand/orsentimentofthetextualdata.Itissaidthatthistechniqueallowsforhighprecisionbutlowrecall.LexiconBasedtechniquesworkonaninferencethattheaggregatedsemanticpolarityofasentenceortokensisthesumofpolaritiesoftheindividualphrasesorwords.AttheROMIP2012conferencethelexicon-basedmethodproposedin(Chetviorkin&Loukachevitch,2013)wasused.Theauthorsemployedanemotionalresearchtechniqueforsentimentanalysisdictionariesforeachofthedomains.Usingappraisedwordsfromtheappropriatetrainingset,thedomaindictionarieswerereloadedwiththehighestweightedwords

3. METHoDoLoGy

Webuiltaclassificationmodelusingpredictivevariables.Thestepswere:

1. Applyalinguistictextminingapproach(Ordenesetal.,2014)todeterminethecategoryofeachcustomersreviewbyminingthereviewverbatimthusdividingcustomerreviewsintothreegroupsofpositive,neutralornegative.Thissentimentanalysiswillfacilitatetheprocessofmodellingthesentimentscoregeneratedagainstthecustomerratingscore,henceattemptingtopredictthelikelihoodthatthecustomerhasratedtheproduct1thru5.

2. ApplyaNaiveBayesMachineLearningAlgorithmtopredicttheratingofthereviewsasthisisfoundtoworkwellwiththetextdata.

3. Preparethetextualdataforvectorizationbyremovingthepunctuationsandthenconvertingintolowercaseandallstopwordsremovedfromthesentences.

4. ConvertReview’ textdata toa token(whichhaspunctuationsandstopwordsremoved).UsingScikit-learn’sCount-Vectorizerfunctionalitythetextisthenconvertedintoamatrixoftokencounts.Iteffectivelyisa2Dmatrixwhereeachrowisauniqueword,andeachcolumnisareview.

5. DevelopthepredictionmodelusingaNaïveBayesalgorithmtopredictcustomersatisfactionscoresaccuracy.

6. EvaluatethemodelbytestingthepredictedvaluesagainsttheactualratingsinthedatausingaconfusionmatrixandclassificationreportfromtheScikit-learntoolkit.Thedataisrunthroughthemodelandtheefficiencyofthemodelisnotedandcommentedupon.

7. Testtheoutputfromthemodeltoconfirmthatpredictionsareaccurateandlineupwiththeexpectedresults.Thisinvolvestestingrandomreviewsandnotingthepredictedoutcomefromthemodelandcomparingwiththeactualratinginthedata.

3.1. Data TransformationThedatasetwassourcedfromKaggle1.ItcomprisedmobilephonereviewsfromtheAmazonwebsite(n=413,840)withonly62nothavingaverbatimcommentinthe‘Review’column.Thekeycomponent


29

welookedforwastherating(startratingor1-5ratingforexample)andacorrespondingreviewfromthecustomer.Wewantedawide-rangingvarianceinthetypesofreviewsleftbythecustomerssothatthereissomedegreeofsubjectivityasdifferingviewsareimportanttoaidthedesignanddevelopmentofthesentimentmodel.Therefore,thereisanexpectationtoseedifferentresultsfromthesentimentanalysisofthisdatasetcomparedtothatofdatasetsabouttechnologyproducts.ThevariableswereProductTitle,Brand,Price,Rating&Reviewtextandnumberofpeoplewhofoundthereviewhelpful.Pythonwasusedasithasaconsiderablevarietyoftextminingandsentimentanalysispackages(Bird,etal.,2009)andpackagesthatallowthevisualisationofthedataandithasasolidrangeofpredictiveanalyticspackages.

Transformingcustomersatisfactiondataintocustomerinsightsrequiresaformalsystembywhichmeasuresareincludedaspartofadatacollectionandanalyticsprocess.Theaimwastoinvestigatethecorrelationbetweenthesentimentanalysisscoreandthatofthecustomerratingscoreandpredictwithadegreeofcertaintythelikelihoodoftheratingbasedofthesentiment?Therefore,ratherthanrelyingonthesurvey-basedCSATmeasurement,weappliedbigdatatechniquestothemobilephonecustomerdatatoevaluatethelinkbetweensatisfactionandsentiment.Thebasisofthisanalysisistostudythecustomerfeedbackfromthe‘Review’columninthedatacontainingthetextualfeedback,tounderstandthesentimentbehindtheircommentaryandtoidentifykeythemesandwordswithinthisdata.TodothisthereareseveralstagesrequiredandthisfollowstheCross-IndustryProcessforDataMining(CRISP-DM)(Chapman,etal.,2000).TheCRISP-DMprocessisthe“defactostandardfordevelopingdataminingandknowledgediscoveryprojects”(Marbánetal.,2009).Thismulti-layeredresearchmethodologyhas5keycomponentsofwhichthisprocessutilizessomebutnotalltheelementsoftheframework:(Marbánetal.,2009).

CustomerswhopurchasedaphonefromAmazonwereinvitedtoratethephonei.e.CustomersSatisfactionscoreor‘Rating’asperthedata,provideareviewoftheproductviaafreeformtextboxi.e.‘Reviews’andaspertheamazonwebsiteotheruserscouldreacttothereviewand‘up-vote’thefeedbackgivenbythecustomerontheproduct.Thecustomerexperiencestrategyofcapturingdatalikethishasbeentoucheduponpreviouslybutitisrelevanttounderstandhowitisapplicableinabusinesscontext.Asabusinesstheremustbesomeunderstatingofhowcustomerreacttoproductsandinthisdatasettherearetwokeyvariablesthatprovideinsightintothisi.e.RatingandReviews.Wenotethatthisisanamazondatasetandnotcompanyspecificdatai.e.Appledonotownthisdataforexampleandwhilsttheycouldcarryoutanalysisonthedatadonothavesoleownershipofit.

Thedatasetcanbeconsideredlongitudinalcustomerdatacoveringattitudinaldataelicitedfromthecustomersurvey,whichincludesSatisfactionratings(‘Rating’)andqualitativecustomerverbatimcomments(‘Reviews’),behaviouraldata(transactionaldatai.e.price)andproductdataacrossmultipleproductsforallmajormobilephonecompanies.Thedatasourcedidnothaveanydate/timereferencesassociatedwiththerecords.Therefore,theopportunitytoconducttime-seriesanalysisisunavailable,thiswouldbeanareawheretheauthordeemsfutureworkwouldberecommended.Therewasatotalof413,840productreviewrecords.

Thekeypredictivevariablesinthedatasetsthatareconsideredforanalysiswere,RatingandReview. It shouldbenoted that therewillbedescriptiveanalysisperformedon theall theothervariableswithinthedatasettolookforadditionalinsightsfromthedata.Theintentionistoallowthebusinesstoaccuratelypredictthelikelihoodthatacustomerwillrateaproductthewaytheyhave.Theprocesswillinvolveidentifyingthosecustomerswhoratedtheproducts1thru5andthenusingapredictivemodellingtechniqueoutputtheaccuracyusingaclassificationreport.

3.2. AnalysisPearsonandSpearmancorrelationplotsascertainwhere therearecorrelationsbothpositiveandnegativeinthedataandthiswillthenhelpdecisionmakingeasierwhendecidingwhichvariablestolookataspartofthedescriptiveanalysisinlatersections.Datacorrelationshelptoidentifyrelatedvariablesandcanbeusedby investigating the relationshipbetween twoquantitative,continuous


30

variables.Pearson’scorrelationcoefficient(r)isameasureofthestrengthoftheassociationbetweenthetwovariablesandtheintervallevelrangesbetween-1to+1.WeuseSpearman’scorrelationwhichisthenonparametricversionofthePearsoncorrelation.Spearman’scorrelationcoefficient,(ρ,alsosignifiedbyrs)measures thestrengthanddirectionofassociationbetween tworankedvariables(Hauke&Kossowski,2011).

ThefirstpartoftheanalysiswasdoneusingthematplotliblibraryandacomparisonwasrunbetweentheReviewandReviewlengthvariablestogetanunderstandingiftherewasanysignificantcorrelationbetweenthelengthofareviewgivenbythecustomerandthesubsequentrating.Histogramsweregeneratedrepresentingeachreviewcategory i.e.1 ->5showing thedistributionof reviewlengthsforeachrating.Acomparisonwascarriedouttoanalysehowtheratingandpriceaffectedeachother.UsingtheSeabornlibrary,ascattergraphwitharegressionlinewascreatedforeachratinginrelationtothecorrespondingprice.Forthisanalysistheheatmapfunctionalityfromtheseabornlibrarywasusedandthisproducedaheatmapgraphicthatgroupedthenumericalvariablesinthedata(Price,ReviewVotesandReviewLength)andassignedacolourandcorrelationscoretoeachrelationshipindicatingthestrengthofthecorrelation.Thisisusefulinthesensethatitallowsthereadertospotanyimportantpositiveornegativecorrelationsbetweenthedatapointsthatmaynothavebeenobviouswheneyeballingtherawdata.Basedontheresultsonecouldthendecideonhowbesttocontinueanalysingthedata.

Ascatterchartfromtheseabornlibrarywasusedinthisanalysistolookatthedistributionofreviewlengthsacrosseachrating.Inthisscenariothedataisplottedusingascatterplotandthereasonforthisistoidentifyifthereareanyoutliersinthedataacrossthe5ratings.ToinvestigatetheRatingsdistributionofthedataaKDE(KernelDensityEstimation)plotwasusedtovisualisetheratings.Instatistics,KDEisanon-parametricwaytoestimatetheprobabilitydensityfunctionofarandomvariable,inthiscasetheRatinggivenbycustomers.KDEattemptstosmooththedatawhereassumptionsaboutthepopulationaremadebasedonafinitedatasetasshownbelowinEquation(2).

f̂ xn

K x xnh

Kx x

hh h ii

ni

i

n

( ) = −( ) = −

= =

∑ ∑1 1

1 1

(2)

whereKisthekernel,anon-negativefunctionthatintegratestooneandh>0isasmoothingparametercalledthebandwidthInEquation(2),thekernelKwithsubscripthiscalledthescaledkernel.Ideallyonewouldlooktoselectthehvalueassmallasisfeasibleforthedata;however,thereisalwaysatrade-offbetweenthebiasoftheestimatoranditsvariance.Thebandwidth(h)controlshowtightlytheestimationisfittedtothedata,verymuchlikethebinsizeinahistogram.TheKDEplotisverysimilartoahistogram,itestimatestheprobabilitydensityofacontinuousvariable,inthiscasetheprobabilityofaratingfrom1to5.Inthiscaseaunivariateorkerneldensityestimateisplotted.AcomparisonbetweenAppleandSamsungwasplottedintwoseparategraphsallowingfortheanalysisofthedistributionofratingsbetweenthetwobrands.Asidebysidecomparisoncanbevaluableindetermininganyobviousdifferencesbetweenthebrandswithfutureanalysisinmind.StayingwithKDEplotsandusingSamsungagainasanexample,aplotwascreatedtovisualisethedistributionbyPriceoftheSamsungproductsusingtheSeabornlibraryasintheprevioustwosubsections.

3.3. Sentiment AnalysisUsing theTextBlob library,weconducted sentiment analysisoneachcomment for eachdatasetusingTextBlobandassignedeach‘Review’apolarityscoreofbetween-1 ->1.Theranges forPositive,NegativeandNeutralwerePositiveSentimentScore:>0.1to1,NeutralSentimentScore:between-0.1and0.1andNegativeSentimentScore:-1to<-0.1.Theserangesrepresentthevariouscategoriesofsentimentpolarity.Adataframenamed‘df_polarity_desc’wasusedtooutputatable


31

withthecolumnsReview,SentimentandPolarity.Thisenabledthereadertoviewtheoutputfromthe sentiment analysis and to put some context around the actual Sentiment Polarity scores foreachindividualreview.Twoseparatewordcloudswerecreatedtovisualboth‘PositiveWords’and‘NegativeWords’.Both‘PositiveReviews’,‘NeutralReviews’and‘NegativeReviews’weredefinedbasedonthenumericalrangesdiscussed.ThisallowedtheauthortocategorisetheReviewsintothethree‘buckets’described.

UsingtheWordCloudfunction,twowordscloudswerevisualisedforbothPositiveandNegativereviewsi.e.themostusedwordsineachcategorisation.AwordcloudwithablackbackgroundwasproducedforthePositiveReviewwordsandaworldcloudwithawhitebackgroundforthenegativereviewwords.TheSentimentPolarityscorewasgeneratedona-1to1scalebasedonthecorresponding‘Review’.Theintentionistotestwhetherthesentimentpolarityscoreisrelatedtotheratingprovidedforthecorrespondingtextcommentfromthecustomer.ToachievethistheSentimentscorehasbeennormalisedfroma-1to1scaletoa1to5scaletomimictheratingsscaleavailabletothecustomer.Tonormalizethedata,Equation(3)wasusedandappliedtothe‘Polarity’scoreandthenormaliseddatawasnamed‘Polarity_Norm’.

xx x

x xnew=

−−min

max min

(3)

ThePolarityscoreswerenormalizedaspertheabovetoscoresbetween1and5toallowfordirectcomparisontothecorrespondingratingscoreOncethePolarityscorehasbeennormalised,themeanandstandarddeviationof‘Polarity_Norm’isusedtocreateseveralchartsanalysingtheprobabilitydistribution foreachRatingcategory.The twomain ratingcategories thathavebeenidentifiedthroughouttheresearchhavebeenRating1andRating5.Aplotofthepolaritydistributioniscreatedtoexplorethesentimentfromthereviewsintermsoftheactualratingandthisprovidesakeyinsightintothedata.

WordfrequenciesaregatheredfromtheReviewstextualdatasetintoadocument-termmatrixthusallowingtheextractionoffeaturesfromtheReviewssubdataset.Wecanemployafeatureextractiontechniquethatallowsfortransformingarbitrarydata,inthiscasetheReviewtext,intonumericalfeaturesusableformachinelearning.Thevectorizationprocessturnsthecollectionoftextdocumentsintonumerical featurevectors.This techniqueemploys tokenization,countingandnormalizationofthedataandiscommonlyreferredtoBagofWordsor“Bagofn-grams”representation.Using‘bow_transformer.vocabulary’ a classification technique is carried out on the data. Since thevectorizationprocessprovideddiscretefeaturesi.e.thewordcountsfromtheReviewdatathenextstepistoemployMultinomialNaïveBayeswiththefeaturecounts(Huang,2017).Thishelpsuscalculatetheprobabilityoftworatingscategories,Rating1andRating5,usingBayestheorem.UsingReviewsasthesecondvariableinthetesting,theprobabilityandaccuracyofpredictingaRatingof1and5basedonthereviewswillbeobtained.Todothisthedataissplitintoatestingandtrainingset.Wetake30%ofthedataandusethatasthetestingset.Usingaconfusionmatrix,wesummarisetheperformanceoftheclassificationalgorithm.Thenumberofcorrectandincorrectpredictionsaresummarizedwithcountvaluesandbrokendownbyeachclass.Weruntheconfusionmatrixfor1and5RatingsandthenallRatingsi.e.1thru5.Duetothenatureoftheconfusionmatrixitisexpectedthattheaccuracywilldropwhenthe5classesaremodelled.

4. EVALUATIoN

ThePearsonCorrelationshowedaslightcorrelationbetweenRatingandReviewVotes(seeFigure4).ThiswouldsuggestthatastheRatingincreasesthepriceincreasesandthenumberofReviewvotes


32

giventotheproductincreasesalso.Whichwouldmakesensebutsincethereisnotastrongcorrelationthe thought is that thiswouldbenotworthexploringwithfurtheranalysis.Using theSpearmancorrelationinFigure5,onecanobserveverysimilarcorrelationswithaslightnegativecorrelationbetweenRatingandReviewvoteswhich,aswiththePearsontest,doesnotwarrantdeeperanalysis.

ThecorrelationbetweenvariablesshowedanegativecorrelationbetweenPriceandPolaritywhichwouldsuggestthatthesentimentofthereviewtendstobemorenegativeasthepricegoesup.Thispotentiallycouldindicatethatcustomersarelesshappywiththeirmoreexpensivephonesandexpectbetterfromtheproduct.PolarityandReviewvoteshaveapositivecorrelationwitheachother,indicatingthatthemoreaReviewis‘up-voted’thehigherthemorepositivethesentimentseemstobe,whichwouldmakesenseformalogicalstandpoint(seeFigure6).

Wecreatedhistogramstocomparereviewlengthsagainsteachratingtohelpusnotethenumberofreviewsandwhichbucketsthereviewlengthbelongtoforeachrating.ThisisrevealsthatthereisanoverwhelmingnumberofreviewsforRating5andaconsiderablenumberofreviewsforRating1.ThiswillbeusefulgoingforwardindeterminingwhichRatingsshouldbemorecloselylookedinto.Figure7isquiteinterestingasisdisplaysthedistributionofpricepointsacrosseachRatingvalue.Asexpected,manypricepointsexistbelowthe$1000rangwithasmallnumberofoutliersupwardsof$2000.

Usingaheatmaptolookatcorrelationsbetweenthevariablesit isnoticeablethattherearenegativecorrelationsof-0.91betweenPriceandreviewlengthwhichwouldindicatethatthegreaterthepricetheshorterthereviewandviceversa.ThiscouldbeattributedtothefactthattherearemorereviewsatthelowerpricelevelasnotedpreviouslyinFigure8andthereforethisfindingitnotatallsurprising.Likeothercharts,itwasfeltthatacloserlookshouldbetakenatReviewlengthvRating.(seeFigure9).WhatisobservedhereisthedistributionofreviewsbyLengthvRatinganditshowsthatthereisaslighttendencyforcustomerstocommentmoreaboutphonesthattheyarehappywithhencethedistributioninFigure10.

Figure 4. Pearson Correlation


33

TogetasenseforthemultivariatedensityoftheratingsthechartinFigure11showsthehighdensityof%ratingsacrossthereviews.Thereisasmallerbutsignificantpopulationof1Starreviewsandthiswillsupplementthepreviouschartsforfurtherinvestigation.ComparingFigure11withFigure12there isnodistinctdifferencebetweenthe twodistributions indicatingthat there isnosignificantdifferencebetweenbrands.

4.1. Word FrequencyThe word phone is by far the most common. Some other interesting outputs from this are theoccurrencesofthewordsGreatandGoodwhichwouldleadtoapossibleconclusionthattheremaybeanoverwhelmingnumberofpositivereviewsinthedata.Figure13showstheTop7-words.

AsanothervisualalternativeFigure14showsawordcloudofwordfrequencies.Thisgivesanimpactfulvisualoftheoccurrencesofthetop100wordsinthereviewfeedback.

UsingtheTextBloblibrarytheintentionistogiveeach‘Review’asentimentscoreofbetween-1->1.Thesedefinethevariouscategoriesofsentiment.Thereisanoverwhelmingnumberofpositive

Figure 5. Spearman Correlation

Figure 6. Review v Review Length by Ratings


34

reviewsandthisfollowsfromtheexploratoryanalysisaboveshowingthehighfrequencyof5-starratings.Thisdoesnottellthewholestorysofurtheranalysisonthesentimentisrequired.

TheresultsinFigure15showthatsentimentanalysisresultsmostlyfallintothePositiveReviewcategorybasedon therangeused.NeutralReviewsseems tohavea reasonablevolumewith thenarrowrangetakenintoconsiderationandthereareasmallernumberofNegativereviewscompared

Figure 8. Heat Map

Figure 7. Regression Plot of Price v Rating


35

totheothercategories.Thiswouldsuggestthatmostcustomerswerereasonablyhappywiththeirpurchases.Interestingly,aRatingof1hadthesecondhighestvolumebutaspertheabovetherearesignificantlylessNegativereviewswhichonthefaceofitsuggeststhatthosewhogaveaRatingof1theircorrespondingsentimentscoremaybehigherandnotdirectlyrelatedtotherating,thisisfornowaquestiontobeexploredfurther.InterestinglytheobservabletrendforRatingscoresof5andPositivereviewsseemfairlyalignedsofurtheranalysisonthisisneeded.

Figure16showsasmallcohortofreviewswiththecorrespondingSentimentandPolarityscores.Thisgivesthereaderaveryquicktasteofoutputfromthecodeshowinghoweachpieceoftextisgivenascore.Averybriefscanoftheresultsshowsthatit‘seems’accurate,takingrecord[2]asapositivereviewi.e.‘VeryPleased’andconverselylookingatrecord[5]withthewordproblemsmentionedthesentimentis-0.3whichseemstofit.Thisisobviouslynotindicativeofthetotalpopulationof

Figure 9. Rating v Review Length Regression plot

Figure 10. Kernel Density Estimate


36

reviewsbutgivessomeinsightintotheresults.Figure17showsthewordcloudforPositiveReviewsandhereisitnoticeablethatkeywordssuchas‘Great’,‘Love’and‘Good’arequiteprevalentinthedataandthisissomethingthatwouldbeexpected.ReviewingFigure18,thewordcloudfornegativereviewsitisnoticeablethattherearemanyoccurrencesofthewords‘Disappointed’,‘Broken’and‘Bad’whichwouldtendtobeonthenegativesideandagainsomethingthatwouldbeexpectedtoseeintheanalysis.

ThemultinomialNaiveBayesclassifierissuitableforclassificationwithdiscretefeatures(e.g.,wordcountsfortextclassification).Themultinomialdistributionnormallyrequiresintegerfeaturecounts.However, inpractice, fractional counts suchas tf-idf alsowork.TheMultinomialNaïveBayesmodelhada93%accuracyratewhenpredictingaRatingof1anda96%accuracyratewhenpredictingaRatingof5,withanoverallmodelaccuracyof96%Thef1-scoreof90%and97%,with

Figure 12. Apple Kernel Density Estimate

Figure 11. Samsung Kernel Density Estimate


37

anoverallcoreof96%representtheweightedaverageoftheprecisionandrecallscoresindicatingthatthemodelisaccuratewhenattemptingtopredictifacustomerwillscorethereviewa1or5.

Themodelwastrainedontheentiredatasetincludingall5ratings.Figure19showstheprecisionandrecalloutputfromtheMultinomialNaïveBayesmodellingandFigure20showstheaccuracyandaverages.Itisnoticeablethatwhenthemodelisrunwiththeentirepopulationofthedatatheaccuracydropsquitesignificantlytoanoverallscoreof74%.ThisindicatesthatthedataforRatings

Figure 13. Word Frequency Chart

Figure 14. Word Cloud


38

Figure 16. Reviews, Sentiment and Polarity Table

Figure 15. Sentiment Frequency Chart


39

2,3&4hasasignificantimpactonthemodels’accuracy.ItisnoteworthythatRatings1and5remainquitehighandthisispotentiallyduetothelargevolumesofRatingsinthesecategoriescomparedwiththeothercategories.

4.2. Normalisation of PolarityUsing themeanandstandarddeviationof thePolarity_Norm,each ratingwasmatchedwithitscorrespondingnormalisedpolarityscore.Figure21isinterestingformanyreasonsasitcanbeobservedthatacustomer’sratingof5willhaveapproximatelya60%probability that thesentimentscorewillbe4.Secondlythereisanoticeabledistributionbetween3and4whichisinandaroundtheneutralsentimentcategory.Thisslightshiftawayfromapolarityscoreof5suggeststhatwhilemostofsentimentmaybepositivethereisasignificantcohortofreviewsthathaveaslightnegativeorneutraltonewhichwouldindicateanareaforfurtherexplorationfromabusinessperspective.

Figure 17. Positive Review Word Cloud

Figure 18. Negative Review Word Cloud


40

ThesamemethodwasappliedwhenplottingthegraphinFigure22.ConsideringthatthesearetheprobabilitydistributionsforRatingsof1,onecouldbeforgivenforthinkingtheseweretheresultsforaRatingof3.Itissignificanttoobservethatthepolarityorsentimentof1Ratingsaretendingtowardsslightlynegativeandnatural,thiscouldbedowntomanyreasons.Themodelitselfmaybeunabletoidentifynegativewords,orcustomerstendtonotbeoverlycriticalwhencommenting.

Figure23showsthedistributionsofRatings1,3,4&5togiveanideaofthehowthesentimenthasbeenmeasuredfortheseratings.DuetospaceconstraintsitwasdecidedthatRating2wouldbeleftoutanditalsohadthelowestvolumeofalltheratings.Rating3and4outputgraphsseemtobeconsistentandinlinewiththeexpectationsintermsofthedistributionofpolarity.Forthatreason,the suggestionwouldbe thatno further analysis isneeded for these scoresand fromabusinessperspectivethemostvaluableinsightscomefromtheRatings1and5.

Themodelwasintuitiveenoughtoestablishthatsentimentanalysisisakeyindicatorofthecustomerexperienceandforthatreasonshouldbeseriouslyconsideredaspartofanybusinessesmetrics strategy.Thekeyanalysis andultimately the findingsof this researchcentre around thenormalisationofthepolarityscorefromthesentimentanalysistoalignwiththecorrespondingratinggivenbythecustomerforthatproduct.Itwasnoticeablethatalargedistributionofratingsfellinto

Figure 20. Accuracy and Weighted Average Table

Figure 19. Confusion Matrix (additional variables)


41

the‘1’and‘5’buckets.Forthatreason,itwasprudenttoexaminetheassociatedpolaritywiththesescoresmorecloselythantheotherratingscores.Theanalysis,supplementedwiththevisualisationsshowedthatforaRatingof5,thesentimentdistributionwasskewedtowardsascoreof4withaslightemphasison3also.Thisshowedthatwhilecustomerswereratingtheproducts5/5theiractualsentimentregardingtheproductwasnotsooverwhelminglypositive.Converselythesamecanbe

Figure 21. Normalised Polarity Distribution for Rating 5

Figure 22. Normalised Polarity Distribution for Rating 1


42

saidofcustomersgivingtheproductsaratingof‘1’.Theresultsshowedthatwhilecustomersratedsomeproductspoorlytheoverallsentimentfellintothesomewhatnegativeandneutralbands.WhatweshowisthattherecanbeanoverrelianceonusingtheRatingsscoreasasinglepointoftruthforabusiness,whenbigdataanalyticstechniquesareemployeditispossibletodrilldownintotherealfeelingsofcustomersandgetaclearerpictureofproductsandservices.ThepredictiveanalyticsmodelusedmultinomialBayesiannetworkstopredictcustomerratingsusingthesentimentreviewscore.Theresultsoftheresearchhighlighttheshort-comingsoftheratingsscoreasasinglecustomerexperiencemetric,therebysupportingthenotionthatabusinesscannotrelysolelyonthismetric.Theresultfromthepredictionphaseisamodelthatiscapableofcorrectlypredicting74percentofcustomersratingsbasedontheirreviews.Thisisnotremarkablyhighpercentageintermsofaccuracyandbacksuptheclaimabovethattheratingscannotbeusedasanaccuratepredictorofsentimentalone.Onthisbasisoftheresults,thisresearchrecommendsthatbusinessesarediscouragedfromusingtheCSATorratingsscoresasasinglepointoftruthforcustomerexperience.Althoughthesemetricsareimportant,theydonotprovideaninsightintotheopinionofcustomersregardingtheexperiencetheyhavehadwithabusinessoraproduct.Theapproachemployedheredemonstratesthatadata-drivenapproachtoexploringcustomersentimentviatextmininghelpsformaclearerpictureforthebusinessanditisimportantthatfirmscapitalizeonthistypeofdata.

5. CoNCLUSIoN

QuantitativesurveyquestionssuchasCustomerSatisfactionwillalwaysremainwidelypopularwithbusinesseswhenattemptingtotakethepulsefromtheircustomersonproductsandcustomerexperience(Hague&Hague,2018).Tooverlyrelyonsinglecustomermetricssuchascustomerexperienceornetpromoterscoreisariskystrategyandthesuggestionisthatbusinessshouldfocusuponamorenuancedmultidimensionalapproachtopredictingcustomerbehaviour(Zakietal.,2016).

Figure 23. Rating 1,3,4 & 5 Polarity Distribution


43

Weexploredthenotionoffocussingandadoptingamorescientificapproachtomeasurecustomersentimentbyminingverbatimfeedbackandapplyingasentimentanalysismethodologytodeterminehowcustomersfeelabouttheproductsasopposedtothescoretheygavetheproductbasedontheconstrictive1to5rating.WeprovidedaframeworktounderstandhowacustomerexperienceprocesscanbedevelopedtoprovideinsightsintotheVoiceoftheCustomer.

Keyrecommendationsareasfollows:

• Establishingandmaintainingapositivecustomerexperienceforcustomersisimperativeforanycompany’slong-termgoalsandultimatelytheirsuccessasabusiness.

• Thereneedstobebuy-inataseniormanagementleveltoembraceacustomer-centricmetricsstrategywhichincludesdevelopmentofarobustprocesstomeasurecustomersentiment.

• Textminingandsentimentanalysisplayakeyroleinunderstandinghowcustomersfeelabouttheproductstheypurchaseortheexperiencetheyhadwithacompany(Fan,2014).

• Whilesurvey-basedmetricsgatheringwillalwaysremainanimportantprocessinmeasuringleading indicators of customers’ behaviour, this should be supplemented with a formalcustomerexperienceprograms thatmonitorperformanceandguide improvement efforts(Homburgetal.,2015).

• Organizationsshouldbegintotapintotherichveinofcustomerfeedbackdataattheirdisposalintheformoffreeverbatimtext.

ThecentralemphasisoftheoutcomessuggeststhatitwouldbefollyforabusinesstoignorecustomerfeedbackfromsurveysorsocialmediachannelsandsolelyrelyuponquantitativesuchascustomersatisfactionCSAT.ThereisnodoubtthatCSATisanextremelyimportantmetricandshouldcontinuetobemeasuredhoweverbusinessesneedtocapitaliseoncustomerdatapointssuchassentimenttoreallygetanunderstandingofhowcustomersarefeeling.

Descriptiveanalyticsplaysasignificantroleforabusinessintermsofunderstandingwhatishappeningwiththeirproductsandservices.Itgivesasnapshot,usuallyafterthefactofperformanceandexperienceandthisresearchshowedthatbyvisualisingthesedatapointsandreportingoutonthemcanbevaluabletoabusiness.Theresultsgiveaclearinsightintocustomeropinionandthenuancesaroundwhichcustomersthinkasopposedtohowtheyrateaproductonalimited5-pointscale.Thefindingspointtowardsdeeperlearningaroundsentimentandhowitcanbeutilisedinotherareasofthebusinessandwithadditionaldatapointsnotpresentinthisresearch.Thisstrategycouldbeemployedinabusinessscenarioandtheresultsandfindingswouldprovevaluableinhelpingbuildacustomerexperiencestrategyforabusiness.ThekeyfindingsarethatitisunwisetorelysolelyonCSATorNPSscoreswithoutexaminingcustomerfeedback.


44

REFERENCES

ACSI.(2018).American Customer Satisfaction Index.Retrievedfromhttp://www.theacsi.org/about-acsi/the-science-of-customer-satisfaction

Ajenstat,F.(2017).TableaufiveyearsaleaderinGartner’sMagicQuadrantforAnalytics.Tableau.Retrievedfromhttps://www.tableau.com/about/blog/2017/2/tableau-five-years-leader-gartners-magic-quadrant-analytics-66133

Allahyari,M.,Pouriyeh,S.,Assefi,M.,Safaei,S.,Trippe,E.D.,Gutierrez,J.B.,&Kochut,K.(2017).Abriefsurveyoftextmining:Classification,clusteringandextractiontechniques.arXiv:1707.02919

Arokiasamy,A.(2013).Theimpactofcustomersatisfactiononcustomerloyalty.Journal of Commerce,5(1),14–21.

Aswani,S.(2017).AnalyzingCustomerFeedbackData:ManualAnalysisvsNLP.Clarabridge.Retrievedfromhttps://www.clarabridge.com/blog/analyzing-customer-feedback-data-manual-analysis-vs-nlp/

Baars,H.,&Kemper,H.G.(2008).Managementsupportwithstructuredandunstructureddata—anintegratedbusinessintelligenceframework.Information Systems Management,25(2),132–148.

Balahur,A.&Montoyo,A.(2008).Afeaturedependentmethodforopinionminingandclassification.

LaValle,S.,Lesser,E.,Shockley,R.,Hopkins,M.S.,&Kruschwitz,N.(2011).BigData,AnalyticsandthePathFromInsightstoValue.MIT Sloan Management Review,52(2),21–32.

Bird,S.,Klein,E.,&Loper,E.(2009).NaturallanguageprocessingwithPython:analyzingtextwiththenaturallanguagetoolkit.InNatural language processing with Python: analyzing text with the natural language toolkit(pp.1–34).O’ReillyMediaInc.

Broan,S.E.&Steger,A.J.(2013).Keyperformanceindicatorsarenotjustaboutprofit.CFMA.Retrievedfromhttp://www.cfma.org/content.cfm?ItemNumber=1899

Brownlee, J. (2016). Support Vector Machines for Machine Learning. Retrieved from https://machinelearningmastery.com/support-vector-machines-for-machine-learning/

Cappelli,A.(2017).Natural Language Processing with Stanford CoreNLP.Retrievedfromhttps://cloudacademy.com/blog/natural-language-processing-stanford-corenlp-2/

Chapman,P.,Clinton,J.,Kerber,R.,Khabaza,T.,Reinartz,T.,Sherer,C.,&Wirth,R.(2000).Crossindustrystandardprocessfordatamining(CRISP-DM)1.0.

Chen,H.,Chiang,R.,&Storey,V.(2012).Businessintelligenceandanalytics:Frombigdatatobigimpact.Management Information Systems Quarterly,1(1),1165–1188.doi:10.2307/41703503

Chetviorkin,I.&Loukachevitch,N.(2013).EvaluatingsentimentanalysissystemsinRussian.

Chowdhury,G.(2005).Natural languageprocessing.Annual Review of Information Science & Technology,37(1),51–89.doi:10.1002/aris.1440370103

Clarabridge.(2015).Retrievedfromwww.clarabridge.com

CranProject.(2018).CRAN Task View: Natural Language Processing.Retrievedfromhttps://cran.r-project.org/web/views/NaturalLanguageProcessing.html

Daume,H.III,&Marcu,D.(2006).Domainadaptationforstatisticalclassifiers.Journal of Artificial Intelligence Research,26,101–126.doi:10.1613/jair.1872

DepartmentofIndustry,AustralianGovernment.(2018).Understandyourcustomers.[REMOVEDHYPERLINKFIELD]Retrievedfromhttps://www.business.gov.au/info/plan-and-start/start-your-business/what-is-customer-service/understand-your-customers

Devika,M.D.C,S.&Ganesha,A.,2016.SentimentAnalysis:AComparativeStudyOnDifferentApproaches.Chennai,TamilNadu,India,DepartmentofCSE,VidyaAcademyofScienceandTechnology.

EMC.(n.d.).[REMOVEDHYPERLINKFIELD]BigDatastorage[whitepaper].Retrievedfromhttps://www.emc.com/collateral/white-papers/idg-bigdata-storage-wp.pdf

http://www.theacsi.org/about-acsi/the-science-of-customer-satisfaction

http://www.theacsi.org/about-acsi/the-science-of-customer-satisfaction

https://www.tableau.com/about/blog/2017/2/tableau-five-years-leader-gartners-magic-quadrant-analytics-66133

https://www.clarabridge.com/blog/analyzing-customer-feedback-data-manual-analysis-vs-nlp/

http://www.cfma.org/content.cfm?ItemNumber=1899

https://machinelearningmastery.com/support-vector-machines-for-machine-learning/

https://machinelearningmastery.com/support-vector-machines-for-machine-learning/

https://cloudacademy.com/blog/natural-language-processing-stanford-corenlp-2/

https://cloudacademy.com/blog/natural-language-processing-stanford-corenlp-2/

http://dx.doi.org/10.2307/41703503

http://dx.doi.org/10.1002/aris.1440370103

https://cran.r-project.org/web/views/NaturalLanguageProcessing.html

https://cran.r-project.org/web/views/NaturalLanguageProcessing.html

http://dx.doi.org/10.1613/jair.1872

https://www.business.gov.au/info/plan-and-start/start-your-business/what-is-customer-service/understand-your-customers

https://www.business.gov.au/info/plan-and-start/start-your-business/what-is-customer-service/understand-your-customers

https://www.emc.com/collateral/white-papers/idg-bigdata-storage-wp.pdf

https://www.emc.com/collateral/white-papers/idg-bigdata-storage-wp.pdf


45

Schumaker,R.P.,Zhang,Y.,Huang,C.N.,&Chen,H.(2012).Evaluatingsentimentinfinancialnewsarticles.Decision Support Systems,53(3),458–464.doi:10.1016/j.dss.2012.03.001

Fiedler,L.,Großmaß,T.,Roth,M.,&Vetvik,O.J.(2016).[REMOVEDHYPERLINKFIELD]Whycustomeranalyticsmatter.McKinsey.Retrievedfromhttps://www.mckinsey.com/business-functions/marketing-and-sales/our-insights/why-customer-analytics-matter

FoundationP.S.(2018).Getit.Python.Retrievedfromhttp://www.python.org/getit/

Ghiassi, M., Skinner, J., & Zimbra, D. (2013). Twitter brand sentiment analysis: A hybrid system usingn-gramanalysisanddynamicartificialneuralnetwork.Expert Systems with Applications,40(16),6266–6282.doi:10.1016/j.eswa.2013.05.057

ThinkwithGoogle.(2016).Whycutomeranalyticsarethekeytocreatingvalue.Retrievedfromhttps://www.thinkwithgoogle.com/intl/en-gb/marketing-resources/data-measurement/why-customer-analytics-are-the-key-to-creating-value/

Harris,D.(2018).WhatIsTextAnalytics?WeAnalyzetheJargon.Software advice.Retrievedfromhttps://www.softwareadvice.com/resources/what-is-text-analytics/

Hauke,J.,&Kossowski,T.(2011).ComparisonofvaluesofPearson’sandSpearman’scorrelationcoefficientsonthesamesetsofdata.Quaestiones Geographicae,30(2),87–93.doi:10.2478/v10117-011-0021-1

Hollyoake, M. (2009). The four pillars: Developing a ‘bonded’ business-to-business customer experience.Journal of Database Marketing & Customer Strategy Management,16(2),132–158.doi:10.1057/dbm.2009.14

Hosseini,S.,Maleki,A.,&Gholamian,M.(2010).ClusteranalysisusingdataminingapproachtodevelopCRMmethodologytoassessthecustomerloyalty.Expert Systems with Applications,37(7),5259–5264.doi:10.1016/j.eswa.2009.12.070

Huang,O.(2017).Applying Multinomial Naive Bayes to NLP Problems: A Practical Explanation.Medium.Retrievedfromhttps://medium.com/@Synced/applying-multinomial-naive-bayes-to-nlp-problems-a-practical-explanation-4f5271768ebf

TelusInternational.(2017).CustomerFirstMagazine.Retrievedfromhttps://www.telusinternational.com/media/TI-CustomersFirstMagazine-I04.pdf

Kaisler, S. (2013). Big Data: Issues and Challenges Moving Forward. In Proceedings of the 46th Hawaii International Conference on System Sciences(pp.995-1004).doi:10.1109/HICSS.2013.645

Kim,E.(2017).Everything You Wanted to Know about the Kernel Trick.Retrievedfromhttp://www.eric-kim.net/eric-kim-net/posts/1/kernel_trick.html

Liu,B.,Hu,M.,&Cheng,J.(2005,May).Opinionobserver:analyzingandcomparingopinionsontheweb.InProceedings of the 14th international conference on World Wide Web(pp.342-351).ACM.

Loria,S.(2018).TextBlob.Retrievedfromhttp://textblob.readthedocs.io/en/dev/

Maiolo,A.(2015).Comparingn-grammodels.Recognize Speech.Retrievedfromhttp://recognize-speech.com/language-model/n-gram-model/comparison

Marbán,Ó.,Mariscal,G.,&Segovia,J.(2009).Adatamining&knowledgediscoveryprocessmodel.InData mining and knowledge discovery in real life applications.IntechOpen.

Maritz.(2018).Maximiseverbatimswithtextanalytics[whitepaper].Retrievedfromhttps://www.maritzcx.com/blog/wp-content/uploads/2014/11/Maritz-White-Paper-Maximise-verbatims-with-text-analytics.pdf

McCallumzy, A. K., & Nigamy, K. (1998, July). Employing EM and pool-based active learning for textclassification.InProc. International Conference on Machine Learning (ICML)(pp.359-367).

Medhat,W.,Hassan,A.,&Korashy,H.(2014).Sentimentanalysisalgorithmsandapplications:Asurvey.Ain Shams Engineering Journal,5(4),1093–1113.doi:10.1016/j.asej.2014.04.011

Mejova,Y.A.(2012).Sentiment analysis within and across social media.UniversityofIowa.doi:10.17077/etd.zze4b252

http://dx.doi.org/10.1016/j.dss.2012.03.001

https://www.mckinsey.com/business-functions/marketing-and-sales/our-insights/why-customer-analytics-matter

https://www.mckinsey.com/business-functions/marketing-and-sales/our-insights/why-customer-analytics-matter

http://www.python.org/getit/

http://dx.doi.org/10.1016/j.eswa.2013.05.057

https://www.thinkwithgoogle.com/intl/en-gb/marketing-resources/data-measurement/why-customer-analytics-are-the-key-to-creating-value/



https://www.softwareadvice.com/resources/what-is-text-analytics/


http://dx.doi.org/10.2478/v10117-011-0021-1

http://dx.doi.org/10.1057/dbm.2009.14



https://medium.com/@Synced/applying-multinomial-naive-bayes-to-nlp-problems-a-practical-explanation-4f5271768ebf

https://medium.com/@Synced/applying-multinomial-naive-bayes-to-nlp-problems-a-practical-explanation-4f5271768ebf

https://www.telusinternational.com/media/TI-CustomersFirstMagazine-I04.pdf

https://www.telusinternational.com/media/TI-CustomersFirstMagazine-I04.pdf

http://dx.doi.org/10.1109/HICSS.2013.645

http://www.eric-kim.net/eric-kim-net/posts/1/kernel_trick.html

http://www.eric-kim.net/eric-kim-net/posts/1/kernel_trick.html

http://textblob.readthedocs.io/en/dev/

http://recognize-speech.com/language-model/n-gram-model/comparison

http://recognize-speech.com/language-model/n-gram-model/comparison

https://www.maritzcx.com/blog/wp-content/uploads/2014/11/Maritz-White-Paper-Maximise-verbatims-with-text-analytics.pdf

https://www.maritzcx.com/blog/wp-content/uploads/2014/11/Maritz-White-Paper-Maximise-verbatims-with-text-analytics.pdf

http://dx.doi.org/10.1016/j.asej.2014.04.011

http://dx.doi.org/10.17077/etd.zze4b252

http://dx.doi.org/10.17077/etd.zze4b252


46

Miner,G.(2012).Practical text mining and statistical analysis for non-structured text data applications.Oxford:AcademicPress.

Molag,T.(2017).PowerBIvsTableau.Encore Business.Retrievedfromhttps://www.encorebusiness.com/blog/power-bi-vs-tableau/

Natarajan,K.,Li,J.,&Koronios,A.(2010).Dataminingtechniquesfordatacleaning.InEngineering Asset Lifecycle Management(pp.796–804).SpringerLondon.

NTLK.org.(2017).Retrievedfromhttp://www.nltk.org/

Ordenes, F., Theodoulidis, B., Burton, J., Gruber, T., & Zaki, M. (2014). Analyzing customer experiencefeedback using text mining: A linguistics-based approach. Journal of Service Research, 17(3), 278–295.doi:10.1177/1094670514524625

Ott, T. (2018). What is text analytics. Software Advice. Retrieved from https://www.softwareadvice.com/resources/what-is-text-analytics/

Pang,B.,&Lee,L.(2002).Thumbsup?SentimentClassificationusing.Machine Learning.

Parasuraman,A.,Berry,L.L.,&Zeithaml,V.A.(1991).UnderstandingCustomerExpectationsofService.Sloan Management Review,32(3),39.

Project,C.(n.d.).Cran r-project.Retrievedfromhttps://cran.r-project.org/

Qualtrics.(2018).What is the Net Promoter Score (NPS).Retrievedfromhttps://www.qualtrics.com/experience-management/customer/net-promoter-score/

Rapidminer.(2018).Rapidminer.Retrievedfromhttps://rapidminer.com/

ŘehůřekR.(2018).gensim.Retrievedfromhttps://radimrehurek.com/gensim/about.html

SAS. (2017). Natural Language Processing. Retrieved from https://www.sas.com/en_us/insights/analytics/what-is-natural-language-processing-nlp.html

Smith,E.(2016).Everythingyouneedtoknowaboutcustomersuccess.Cobloom.Retrievedfromhttps://www.cobloom.com/blog/everything-you-need-to-know-about-customer-success

Spacy.io.(n.d.).Industrial-Strength NLP.Retrievedfromhttps://spacy.io/

Srivastava,A.,&Singh,D.M.(2014).SupervisedSAofproductreviewsusingWeightedk-NNAlgorithm.In2014 11th International Conference on Information Technology.

Srivastava,T. (2014).Supportvectormachinesimplified.Retrievedfromhttps://www.analyticsvidhya.com/blog/2014/10/support-vector-machine-simplified/

Tamilselvi,A.&ParveenTaj,M.(2013).SentimentAnalysisofMicroblogsusingOpinionMiningClassificationAlgorithm.International Journal of Science and Research,2(10),2319–7064.

Taylor,C.(2018).[REMOVEDHYPERLINKFIELD]Structuredvsunstructureddata.Datamation.Retrievedfromhttps://www.datamation.com/big-data/structured-vs-unstructured-data.html

Tirunillai,S.,&Tellis,G.(2014).Miningmarketingmeaningfromonlinechatter:Strategicbrandanalysisofbigdatausinglatentdirichletallocation.JMR, Journal of Marketing Research,51(4),463–479.doi:10.1509/jmr.12.0106

Upadhyay,P.(2017).RemovingstopwordsNLTKpython.Geeksforgeeks.com.Retrievedfromhttps://www.geeksforgeeks.org/removing-stop-words-nltk-python/

Verint.(2016).Verint Text Analytics.Retrievedfromhttps://www.verint.com/Assets/resources/resource-types/datasheets/text-analytics-datasheet.pdf

Vryniotis,V.(2013).MachineLearningTutorial:TheMaxEntropyTextClassifier.Datumbox.Retrievedfromhttp://blog.datumbox.com/machine-learning-tutorial-the-max-entropy-text-classifier/

https://www.encorebusiness.com/blog/power-bi-vs-tableau/

https://www.encorebusiness.com/blog/power-bi-vs-tableau/

http://www.nltk.org/

http://dx.doi.org/10.1177/1094670514524625



https://cran.r-project.org/

https://www.qualtrics.com/experience-management/customer/net-promoter-score/

https://www.qualtrics.com/experience-management/customer/net-promoter-score/

https://rapidminer.com/

https://radimrehurek.com/gensim/about.html

https://www.sas.com/en_us/insights/analytics/what-is-natural-language-processing-nlp.html

https://www.sas.com/en_us/insights/analytics/what-is-natural-language-processing-nlp.html

https://www.cobloom.com/blog/everything-you-need-to-know-about-customer-success

https://www.cobloom.com/blog/everything-you-need-to-know-about-customer-success

https://spacy.io/

https://www.analyticsvidhya.com/blog/2014/10/support-vector-machine-simplified/

https://www.analyticsvidhya.com/blog/2014/10/support-vector-machine-simplified/

https://www.datamation.com/big-data/structured-vs-unstructured-data.html

http://dx.doi.org/10.1509/jmr.12.0106

http://dx.doi.org/10.1509/jmr.12.0106

https://www.geeksforgeeks.org/removing-stop-words-nltk-python/

https://www.geeksforgeeks.org/removing-stop-words-nltk-python/

https://www.verint.com/Assets/resources/resource-types/datasheets/text-analytics-datasheet.pdf

https://www.verint.com/Assets/resources/resource-types/datasheets/text-analytics-datasheet.pdf

http://blog.datumbox.com/machine-learning-tutorial-the-max-entropy-text-classifier/


47

Kevin Curran is a Professor of Cyber Security, Executive Co-Director of the Legal innovation Centre and group leader of the Ambient Intelligence & Virtual Worlds Research Group at Ulster University. He is also a senior member of the IEEE. Prof. Curran is perhaps most well-known for his work on location positioning within indoor environments, blockchain and Internet security evidenced by over 800 published works. His expertise has been acknowledged by invitations to present his work at international conferences, overseas universities and research laboratories. He is a regular contributor on TV and radio and in trade and consumer IT magazines.

Wichmann,M.(2017).The Python Wiki.Retrievedfromhttps://wiki.python.org/moin/

Xia, R., Zong, C., & Li, S. (2011). Ensemble of feature sets and classification algorithms for sentimentclassification.Information Sciences,181(6),1138–1152.doi:10.1016/j.ins.2010.11.023

Yadav,V.,&Elchuri,H.(2013).Serendio:SimpleandPracticallexiconbasedapproachtoSentimentAnalysis.InProceedings of theSecond Joint Conference on Lexical and Computational Semantics(pp.543-548).

ENDNoTE

1 https://www.kaggle.com

https://wiki.python.org/moin/

http://dx.doi.org/10.1016/j.ins.2010.11.023

The Application of Sentiment Analysis and Text Analytics ... ·...

Documents

Transcript of The Application of Sentiment Analysis and Text Analytics ... ·...