The Application of Sentiment Analysis and Text Analytics ... ·...

27
DOI: 10.4018/IJDWM.2019100102 International Journal of Data Warehousing and Mining Volume 15 • Issue 4 • October-December 2019 Copyright©2019,IGIGlobal.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIGIGlobalisprohibited. 21 The Application of Sentiment Analysis and Text Analytics to Customer Experience Reviews to Understand What Customers Are Really Saying Conor Gallagher, Letterkenny Institute of Technology, Donegal, Ireland Eoghan Furey, Letterkenny Institute of Technology, Donegal, Ireland Kevin Curran, Ulster University, Derry, UK ABSTRACT In a world of ever-growing customer data, businesses are required to have a clear line of sight into what their customers think about the business, its products, people and how it treatsthem.Insightintothesecriticalareasforabusinesswillaidinthedevelopmentofa robustcustomerexperiencestrategyandinturndriveloyaltyandrecommendationstoothers bytheircustomers.Itiskeyforbusinesstoaccessandminetheircustomerdatatodrivea moderncustomerexperience.Thisarticleinvestigatestheuseofatextminingapproachtoaid sentimentanalysisinthepursuitofunderstandingwhatcustomersaresayingaboutproducts, servicesandinteractionswithabusiness.ThisiscommonlyknownasVoiceoftheCustomer (VOC)dataanditiskeytounlockingcustomersentiment.Theauthorsanalysetherelationship betweenunstructuredcustomersentimentintheformofverbatimfeedbackandstructureddata intheformofuserreviewratingsorsatisfactionratingstoexplorethequestionofwhether customerssaywhattheyreallythinkwhengiventheopportunitytoprovidefreetextfeedback asopposedtohowtheyrateaproductonascaleofonetofive.UsingvariousSentiment Analysisapproaches,theauthorsassignasentimentscoretoapieceofverbatimfeedbackand thencategoriseitaspositive,negative,orneutral.Usingthisnormalisedsentimentscore,they compareittothecorrespondingratingscoreandinvestigatethepotentialbusinessinsights. Theresultsobtainedindicatethatabusinesscannotrelysolelyonastandalonesinglemetric asasourceoftruthregardingcustomerexperience.Thereisasignificantdifferencebetween thecustomerratingsscoreandthesentimentoftheircorrespondingreviewoftheproduct. Theauthorsproposethatitisimperativethatabusinesssupplementstheircustomerfeedback scoreswitharobustsentimentanalysisstrategy. KEyWoRDS Machine Learning, Sentiment Analysis, Text Analysis, Text Mining

Transcript of The Application of Sentiment Analysis and Text Analytics ... ·...

Page 1: The Application of Sentiment Analysis and Text Analytics ... · sentimentprovidesatruerreflectionofthecustomersexperiencewithaserviceorproduct. 2. RELATED WoRK Textanalyticsprovidesasetoftechniquestomaketrendsvisible,dynamic,andtorevealrelationships

DOI: 10.4018/IJDWM.2019100102

International Journal of Data Warehousing and MiningVolume 15 • Issue 4 • October-December 2019

Copyright©2019,IGIGlobal.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIGIGlobalisprohibited.

21

The Application of Sentiment Analysis and Text Analytics to Customer Experience Reviews to Understand What Customers Are Really SayingConor Gallagher, Letterkenny Institute of Technology, Donegal, Ireland

Eoghan Furey, Letterkenny Institute of Technology, Donegal, Ireland

Kevin Curran, Ulster University, Derry, UK

ABSTRACT

In a world of ever-growing customer data, businesses are required to have a clear line ofsight into what their customers think about the business, its products, people and how ittreats them.Insight into thesecriticalareasforabusinesswillaid in thedevelopmentofarobustcustomerexperiencestrategyandinturndriveloyaltyandrecommendationstoothersbytheircustomers.It iskeyforbusiness toaccessandminetheircustomerdata todriveamoderncustomerexperience.Thisarticleinvestigatestheuseofatextminingapproachtoaidsentimentanalysisinthepursuitofunderstandingwhatcustomersaresayingaboutproducts,servicesandinteractionswithabusiness.ThisiscommonlyknownasVoiceoftheCustomer(VOC)dataanditiskeytounlockingcustomersentiment.Theauthorsanalysetherelationshipbetweenunstructuredcustomersentimentintheformofverbatimfeedbackandstructureddataintheformofuserreviewratingsorsatisfactionratingstoexplorethequestionofwhethercustomerssaywhattheyreallythinkwhengiventheopportunitytoprovidefreetextfeedbackas opposed to how they rate a product on a scale of one to five. Using various SentimentAnalysisapproaches,theauthorsassignasentimentscoretoapieceofverbatimfeedbackandthencategoriseitaspositive,negative,orneutral.Usingthisnormalisedsentimentscore,theycompareittothecorrespondingratingscoreandinvestigatethepotentialbusinessinsights.Theresultsobtainedindicatethatabusinesscannotrelysolelyonastandalonesinglemetricasasourceoftruthregardingcustomerexperience.Thereisasignificantdifferencebetweenthecustomer ratingsscoreand thesentimentof theircorrespondingreviewof theproduct.Theauthorsproposethatitisimperativethatabusinesssupplementstheircustomerfeedbackscoreswitharobustsentimentanalysisstrategy.

KEyWoRDSMachine Learning, Sentiment Analysis, Text Analysis, Text Mining

Page 2: The Application of Sentiment Analysis and Text Analytics ... · sentimentprovidesatruerreflectionofthecustomersexperiencewithaserviceorproduct. 2. RELATED WoRK Textanalyticsprovidesasetoftechniquestomaketrendsvisible,dynamic,andtorevealrelationships

International Journal of Data Warehousing and MiningVolume 15 • Issue 4 • October-December 2019

22

1. INTRoDUCTIoN

Increasingly,leadingmodernbusinessesarelookingtogainmoreinsightsfromcustomerverbatimdatatheycollect.Unfilteredcustomerfeedbackprovidesatremendousopportunitytolearnmoreaboutcustomersentimentinrelationtoproductsandtheirend-to-endexperiencewithacompany.Italsogivesthebusinessanopportunitytounderstandhowtheycan‘close-the-loop’onanypoorfeedbacktheymayreceiveandconverselyhowtheycancapitaliseonpositivefeedbackfromcustomers.Oneofthedrawbacksofonlinecustomerverbatimisitsqualityandquantityofdata(Maritz,2018).Howdobusinessesmineandanalysedatatoensureitrevealskeyinsightsthatcanultimatelydriveprofitsintherightdirection.Onceabusinessrecognisesthebenefitofanalysingitscustomerfeedbackdata,thequestionbecomes,how?Identifyingcustomerverbatimasunstructureddataisthefirststepandsecondlyusinga‘BigData’approachsuchastextminingwillallowthebusinesstoemployamoresophisticatedandadvancedmodellingtechniquetouncoverpatternsinthedata,employingsentimentanalysistoidentifykeythemesfromthefeedback.However,businesseswillcontinuetousestructuredquantitativedatagatheringtechniquessuchasaskingcustomerstoratecustomer/clientinteractions,productsandemployeesonvariousscalesi.e.1-5Stars,0–11-pointscalesandsoon.Thesemethodscannotbesolelyreplieduponanditiskeyforabusinesstocriticallyanalyseitsunstructureddatawhetherthanbeonsocialmediachannels,ad-hocemailstothebusiness,lettersandcustomersurveys.

Allbusinessesneedcustomerstoprosperandgrow.Theyarethemainsourceofrevenueformostbusinesses.Itcanbearguedthatthesuccessofabusinessisdirectlyproportionaltoitsabilitytoacquirecustomers(Smith,2016),keepcustomershappy,identifyissuesorirritantsandconsequentlydrivemoresellingopportunities.Butforabusinesstoachievethisitneedstoidentifythekeyindicatorsthatwillprovideinsightstofacilitatethis.Therearemanyvariablesthatabusinessneedstoidentify,thewho,what,why,andhow.Whoarethepotentialcustomersinthemarketplace?Whatdotheywant?Whydotheywantaproduct?Howasabusinesscantheyretaincustomersandgrowtheirmarketshare.Examplesofwhereabusinesscanutilisetheircustomerdataaresalesdemographics,productandchannelproductpreferences,socialmediaorwebsitesentimentandinteractionsandtransactionbehaviours. Customer analytics is becoming one of the key enablers available to companies tofacilitatethetranslationofrawdataintousefulinsightsabouttheircustomers.Thisresearchintendstohighlighttheimportanceofcustomeranalyticsinthedeliveryofcustomerinsightsbacktothebusinesstodrivedecisionmaking(Fiedler,etal.,2016).60%ofcompaniessaidthatorganisationalsiloswereamajorobstacletoimprovingcustomerexperience(Google,2016).Companiesarefindingitdifficulttounderstandwhatcustomerdatatheyhaveandoftenareunderutilisingthecustomerdatathattheydohave(DepartmentofIndustry,2018).ThemostsuccessfulcompanieshavefoundwaysaroundthisandareactivelyimplementingCustomerExperiencestrategiesi.e.buildingCustomerExperienceteamsthatareagileandcanpivotbasedonbusinessneedsandgoals(Hollyoake,2009).Whileanycompanycanusedatatoreportfinancialsordrivecostsavings,thekeydifferentiationcomesfromhowthisdatacanbeusedtodrivebusinessinsights.Whilebusinessesarequitegoodattrackingstructedfeedbacki.e.numericalratingssuchas1-5-starratingstheyneedtoimproveonunderstandingwhatcustomersaresayingabouttheirproductsandexperienceswiththecompany.Thisposesaseriouschallengeanddifficultyincreasesasabusinessgrowsitscustomerbase(Parasuraman,etal.,1991).Withthatinminditseemsimpossibleforabusinesstocontacteverypotentialandcurrentcustomerindividuallytounderstandtheirsentiment.Therearemanymethodsorchannelsinwhichcustomerscaninteractwithabusiness,forexample,socialmedia,email,customersatisfactionsurveys, registeringsatisfactionvia IVRs,capturing feedbackoncallsviaSpeechAnalytics.So,thechallengeforthebusinessishowtoaccuratelycapture,store,analyseandobtaininsightsfromthisdata.Thisbecomesmoredifficultasthecustomerbasegrows,anditseemsveryunlikelythatabusinesswillcreateatouchpointwitheverysinglecustomertheyhave.

Intoday’shigh-techmarketplace,customershavemoreoptionsforproductsandservicesthaneverbefore(International,2017).Thebarriersofchoicehavecomedownastechnologyandtheinternet

Page 3: The Application of Sentiment Analysis and Text Analytics ... · sentimentprovidesatruerreflectionofthecustomersexperiencewithaserviceorproduct. 2. RELATED WoRK Textanalyticsprovidesasetoftechniquestomaketrendsvisible,dynamic,andtorevealrelationships

International Journal of Data Warehousing and MiningVolume 15 • Issue 4 • October-December 2019

23

hasadvancedandthisleavesthebusinesswithaconundrumofhowtoaccuratelypredictwhattheircustomerswilldointhefuturebeforetheircompetitors.Toachievethis,theyneedtopredictcustomerbehaviour.Thisiswherepredictiveanalyticsbecomestheanswer.Togetaheadofthecurveandultimatelytheircompetitors(LaValleetal.,2011)arguethatabusinessneedstoproactivelyemployadvanceddata analytics techniquesanddrivea strategy thatwill provide insight intowhat theircustomersmightdointhefuture.Whatproductsarecustomersbuying?Whatisthelikelihoodthatacustomerwillchurnandmovetoacompetitor?Arecustomerssatisfiedwiththeproducts,ifnotwhy?Arethereirritantsthatcustomersfindwhendealingwiththebusiness?Thereareamultitudeofquestionsthebusinesswillhave,andpredictiveanalyticscangosomewayinprovidinginsights.Usingadvanceddataanalyticstechniques,abusinesscanusetheircustomerdatatobuildpredictivemodels.Thesemodelshavethepotential topredictfuturecustomerbehaviour.Itaidsbusinessestoidentifycustomersatriskofleaving,potentialissueswithproductsorservices.Withpredictiveanalytics,businessescangainacompetitiveadvantageoverrivalsifusedproperly.Thereisatrendtowardsprescriptiveanalyticsandamoveawayfromdescriptiveanalytics(S.Kaisler,2013).Thisallowsthebusinesstofocusonwhatcouldhappen,basedonpredictivemodellingmethodsratherthanwhathasalreadyhappened.Thereisobviouslyarequirementfordescriptiveanalytics,businesseswillalwayshavetoreportwhathashappenedatacertainpointoftimehencetheneedtocapturemetricsandKPIsbutthenotionofutilizingthisdatatopredictwhatwillhappenshowsaforward-thinkingmindset(Steger,2013).Theopportunityforexploringpredictivecustomeranalyticstodayisbetterthaneverbefore.Datasourcessuchascustomersatisfactionsurveys,socialmediachannels,chatbots,voicecallsusingspeechanalyticsallprovidearichsourceofopportunity.AndnaturallyBigDatatechnologyhaskeptpacewiththeexponentialgrowthindataandiscapableofhandlinglarge-scaledataprocessing,storageandintegrationneededtocapitalizeontheseuncoveredinsights(EMC,n.d.)Tobegintounderstandwhatthecustomersthinkandfeelabusinessneedstounderstandthetypeofdataithasandbuildthetechnologicalinfrastructuretobeabletominethedatatobuildeffectivemodels(McKinsey,2016).Onlyatthispointcanthebusinessdevelopthesemodelseffectivelyanddrivetheiranalyticsstrategyforward

WeexamineifitispossibleforabusinesstorelysolelyonasinglepointCustomerSatisfactionmetricthroughthevalueofsentimentcontainedwithincustomerfeedbackasanindicatorofcustomersatisfactionand/oradvocacywithacompanyorproduct.Essentially,weinvestigateifacustomersentimentprovidesatruerreflectionofthecustomersexperiencewithaserviceorproduct.

2. RELATED WoRK

Textanalyticsprovidesasetoftechniquestomaketrendsvisible,dynamic,andtorevealrelationshipsbetweendatapointsthatmayhavebeendifficulttouncoverbyahumanduetothescaleandquantityofthedata(Miner,2012).Numericratingsremainthemostgenericformofdatacollectionincustomersurveysandunstructuredfeedbacksuchascustomerverbatimcontinuestoincreaseinimportance.Ratherthanattemptingtomanuallyreviewreamsofcustomercomments,toolsliketextanalyticsandspeechanalyticsallowcompaniestoturnthisverbatimfeedbackintovaluableinsightstheycananalyseandmeasuremoreefficientlythanmanuallyreviewingthedata(Verint,2016).TheendtoendTextminingprocesshasmanystagesallofwhicharekeytotheendgoalofderivinginsightsfromthetextualdata.Figure1givesanideaofhowthiswouldlookasaprocessflow.

Itiskeythatthedataneedstobeinaprocessableformat.Multipledatasourcescanbeused,alargeamountoftextualdatamaybeavailableviafreetextformatonsocialmediaplatforms,customersurveysandspeechanalyticstoolswouldalsoconvertconversationsintoarawformat.Preliminaryformattingispreparingthedatabyremovingdirtyorerroneousdata.Thisimprovesthedataandlessenthechanceoferrorsalthoughcanbetimeconsuming(NatarajanK.,2010).Otherformattingprocedurescanbeapplied to thedataas required.Textualdatacontains lotsof smallyethighlyfrequently-usedwords:“at,”“the,”“or,”“for”etc.Toanalysethefrequencyofwordsinatextblock,

Page 4: The Application of Sentiment Analysis and Text Analytics ... · sentimentprovidesatruerreflectionofthecustomersexperiencewithaserviceorproduct. 2. RELATED WoRK Textanalyticsprovidesasetoftechniquestomaketrendsvisible,dynamic,andtorevealrelationships

International Journal of Data Warehousing and MiningVolume 15 • Issue 4 • October-December 2019

24

thesetypesofwordswillalwaysoccur.Stopwordremovalistheprocessoffilteringoutthesewordstolookforlonger,moresignificantwords.Pythonforexamplecomeswithbuilt-instopworddictionariesfordifferentlanguages(Upadhyay,2017).“Tokenizationlooksatthespacesbetweenthewordsaswordboundariesto‘cut’wordsfromthetextstringatthosepoints.”(Ott,2018).Tokenizationisacrucialstepintheprocessofstructuringtextforfurtheranalysis,asotherwisetextistreatedasanunstructured“string.”Theideaistobreakthestringintoindividualwordsortokenize,thesentenceorpieceoftext.Thisallowsforthedeterminationofwordfrequencyinthedocumentandbeginidentifyingrelationshipsbetweenwords.Theintentionforabusinessforexamplewouldbetoensurethatfrequently-usedwordsaretokenizedproperlywithaviewtofurtheranalysis.Mostoftext-miningtoolsallowuserstocreatecustomizedtokenizationrules.TheBag-of-wordsprocess,alsoknownasavectorspacemodel,breaksdownadocumentintoitsconstituentwords,regardlessofgrammarandorder.Thisisacrucialstepasthereisthelikelihoodofreceivingadditionaltextualdatawhichrequirebebrokendowninsimilarfashiontoallowforfurtheranalysis(RobertP.Schumaker,2012).Stemming(a.k.a.“lemmatization”)istheprocessbywhichwordsarereducedinadocumenttotheirbasicform.TakingtheEnglishlanguageasanexample,wordshavemanyformsratherthanonesingleformbasedontheirgrammaticalpurpose.Forinstance,“was,”“are,”“is,”etc.,areallformsoftheverb“tobe.”AswithTokenizationabove,onceyouperformaprocesslikestemmingthenthedataispreparedforfurtheranalyticswhichcanbeappliedtotextualdatatorecognizepatterns(Mejova,2012).OneexampleofanadvancedanalyticstechniqueassociatedwiththetextminingprocessisClusteranalysis.Inthecontextofthetextminingprocess,clusteranalysisinvolvesclassifyingwordsintogroups,suchthatwordsineachgrouparemorerelatedtooneanotherthantheyaretowordsinothergroups.(Kochut,etal.,2017)Thisisdonethroughtheapplicationofalgorithmssuchask-meansandx-meansclustering.UsingK-meansclusteringthevariable‘K’representsthenumberofclustersyouwant.Youcouldpotentiallysegmentdataintofourgroups,eightgroups—thenthealgorithmunderstandsthatandexecutesbasedoffthat‘K’number.Forx-meansthenumberofoptimalclustersisunknownandhereyouwouldspecifyamaxandminnumberofclusters.Thealgorithmwillthenrunananalysisandoutputtheoptimalnumberofclusters.Clusteringisapowerfulwaytoidentifyrelationshipsandtrendsincopiousamountsoftextualdata(Allahyari,2017).Itcanalsobeusedaspartofthesentimentanalysisprocess.DataVisualizationplaysakeyroleinallowingabusinessorresearchertoidentifykeytrendsorinsightsfromthedata.Therearemultiplemethodstovisualisedata,onecouldusetheinbuiltfeaturesofaprogramminglanguagee.g.Pythonhasmatplotlibforcreatingchartsadvisualsfromdata.ThereareavastrangeofDataVisualisationsoftwareavailablefromaBusinessintelligencepointofview,i.e.TableauisanindustryleaderinDataVisualisation(Ajenstat,2017),Microsoft’sofferingofPowerBIisanupandcomingchallengertoTableau(Molag,2017).Thesebusinessintelligencetoolscanlinktomultipledatasourcesi.e.excel,csv,SQLserver,jsonandinterestinglytheynowcanlinkdirectlytoPythonorRscripts,sothatuserscanrunscripts

Figure 1. End to End Text Mining Process (Harris, 2018)

Page 5: The Application of Sentiment Analysis and Text Analytics ... · sentimentprovidesatruerreflectionofthecustomersexperiencewithaserviceorproduct. 2. RELATED WoRK Textanalyticsprovidesasetoftechniquestomaketrendsvisible,dynamic,andtorevealrelationships

International Journal of Data Warehousing and MiningVolume 15 • Issue 4 • October-December 2019

25

insidethetoolandthencreatevisualsandultimatelydashboardstoaidintheinsightexplorationofthedata.ThisisextremelybeneficialinabusinessscenariowherecompaniescanhavededicatedDataVisualizationanalystswhocreateandmaintainthesetypesofdashboards.Aspartofthisresearchtheauthorplanstoutiliseandpromotethebenefitsofvisualisingthedatasothatthereadercanunderstandhowthiswouldplayoutinabusinessenvironment.

Textminingcanbeutilizedtoevaluatecustomersatisfactionfromverbatimfeedbackratherthanrelyingsolelyonanumericalindicatorfromacustomersatisfactionsurvey-based(Hosseini,etal.,2010)proposedadata-miningmodelusingmonetary(RFM)attributesandK-meansclusteringtoevaluatecustomerloyaltyandcustomerlifetimevalue(CLV).Theresearchinvestigatedthedegreetowhichcustomerloyaltyinfluencedbottomlineprofitsforanorganization.Themodeldevelopedshowedaclearimprovementintheaccuracyofthecustomerloyaltymeasurement.Thisstudyhadlimitationsasthemodelreliedtooheavilyoncombiningdata-miningtechniquesaloneanddidnotconsider the impactof textualdata suchasverbatimcomments fromcustomers,which arenowgeneratedatmanytouchpointsinthecustomer’sjourney.(Pang&Lee,2002)employedtextminingmethodstoextractinsightsfromcustomerdatausingsentimentanalysistotrytoimprovecustomerloyaltymeasurement.Anotherexampleofusingtextanalyticstounderstandthedimensionsofproductalignmenttogaininsightintobrandpositioningwasusedby(Tirunillai&Tellis,2014).(Ordenes,etal.,2014)foundthattextanalyticscouldbeusedtoanalysecustomerfeedbacktocaptureinsightsofthecustomerexperience.Thismodelcapturedcustomersentimentfromthedataandtaggeditaseithercomplaints,complimentsorsuggestions/feedback.Thisallowedthemtoproposeinsightstosuggestefficienciesforthecompanyanditsresources.Theirresearch,howeverdidnotcombinethequalitativedataandquantitativedatatoassessandpredictcustomerloyalty.Wheretheresearchprovedusefulwasinrelationtothecomputationmethodsappliedthatcouldproducefindingsthatcanbeexpressedquantitatively,andthisforexampleaidsthedesiredoutcometoconductstatisticalmodelling,datavisualization,andmachinelearningalgorithmstoderivefurtherinsights.

Businesseshave twokindsof customer feedbackdata that they store, analyse andmeasure:structuredandunstructureddata.Structureddataisinformationthatisclearlydefinedandeasytoreporton.Itisthekindofdatathatisgenerallyfoundinasurveyandcanbeorganizedinaspreadsheet:name,location,age,andrating(3outof5stars,forexample,ora10for“mostsatisfied”versusa1for“leastsatisfied”)(Taylor,2018).UnstructureddataasitexiststodayinthecontextofCustomerexperiencedatais,basically,textorfreeformcustomerverbatim,althoughitcanalsoincludeothermediasuchasaudio,photos,orvideos.Unstructureddatacanbecapturedinanemail,the“additionalcomments”sectionofasurvey,voicerecordingsofcustomerinteractionse.g.SpeechAnalyticstools,apostonacustomerreviewsitee.g.Amazon,insocialmedia,andinamultitudeofotherplaces.(Taylor,2018).Analysingallthisdatacorrectlyiscritical,becauseitrevealseverythingfrombuyingtrendstoproductflawsandprovidesasignificantbusinessadvantage.(Aswani,2017)Organizationsoftenstruggletodothisanalysis,however,becauseunstructureddataissignificantlyhardertocategorizeandreportonthanstructureddata.Itcanbehardtoparseduetogrammaticalerrorsorslang, itfrequentlycontainsmultipleunrelatedideas,anditcanrepresentvariouslevelsofsentimentrelatedtoeachidea(e.g.“Icouldn’tcompletetheregistrationprocessonline,Ididn’tunderstandthewebsite,buttherepwasverypatientandtalkedmethroughtheprocess.”).Unstructureddatais,byitsnaturalhardertonavigatethanstructureddata.Anecdotallyitisthoughtthat95%ofcustomerfeedbackdataisunstructured(Kemper,2008)anditcontainsawealthofinformationtoaidintheunderstandingofcustomersentimentandopinion.Dauntingasitmayseemtobusinesses,thistypeoffeedbackispredictedtocontinuetogrow,asmillionsofpeopleincreasetheiruseofonline(Clarabridge,2015).Customersatisfactionisoneofthemostcriticalissuesconcerningbusinessorganizationofalltypes,whichisjustifiedbythecustomer-orientedphilosophyandtheprinciplesofcontinuesimprovementinmodernenterprise(Arokiasamy,2013)seeFigure2.

NetPromoterScore(NPS)isameasureofcustomerloyaltyandisoftenheldastheleadingcustomerexperiencemetric.NPS is still apopularcustomer loyaltymeasurementdespite recent

Page 6: The Application of Sentiment Analysis and Text Analytics ... · sentimentprovidesatruerreflectionofthecustomersexperiencewithaserviceorproduct. 2. RELATED WoRK Textanalyticsprovidesasetoftechniquestomaketrendsvisible,dynamic,andtorevealrelationships

International Journal of Data Warehousing and MiningVolume 15 • Issue 4 • October-December 2019

26

studiesarguingthatcustomerloyaltyismultidimensional.Therefore,firmsrequirenewdata-drivenmethodsthatcombinebehaviouralandattitudinaldatasources.Measuredona0-10scale,itisbrokendownwhereascoreof0–6isclassifiedasDetractors,unhappycustomershowareveryunlikelytorecommendthebusinesstofriendsorfamily.Ascoreof7–8isclassifiedasPassives,thosecustomerswhoarehappyenoughwiththeservicebutnotenoughtoactivelyencourageothersandascoreof9–10isclassifiedasPromoters,thosecustomerswhoareloyalandwouldrecommendthebusinessorproducttofriendsandfamily(Qualtrics,2018).

Sentiment analysis plays a key role in the development of predictive and machine learningmodels.Ithasdevelopedintoawiderall-encompassingprocessthatinvolvesanalysingtextualdataandapplyinga‘sentimentscore’tothattext(Mejova,2012)Oneofthemostwidelyusedapproachesisusingmachinelearningstrategiestodevelopandbuildanalgorithmusingatrainingdatasetbeforeapplyingatestor‘real’dataset.Machinelearningtechniquesfirsttrainsthealgorithmwithsomeinputswithknownoutputssothatlateritcanworkwithnewunknowndata(Devika,etal.,2016).SVMorSupportVectorMachinesaredefinedasasupervisedmachinelearningalgorithmnormallyusedforregressionandclassificationprocesses.SVMusesanon-probabilisticclassifierinwhichalargeamountoftrainingsetisrequired.Itisdonebyclassifyingpointsandidentifyinga(d-1)-dimensionalhyperplane.SVMlooksforthehyperplanethatbestsegregatestheclasses(Devika,etal.,2016).SupportVectorMachinesmakeuseoftheconceptofdecisionplanesthatdefinedecisionboundaries.Adecisionplaneisonethatbestdifferentiatesbetweenasetofobjectshavingdifferentclassmembership.SupportVectorMachines are considered apowerful classification algorithm.Whenusedinconjunctionwithothertechniquessuchasrandomforestandothermachinelearningtools,theygiveaverydifferentaspecttoensemblemodels(Srivasta,2014).ThereisanargumentthatSVMdonotperformwellwithlargerdatasetsduetothehightrainingtimerequired,theydohoweverperformwellwithclearmarginofseparationsbetweenobjects(Brownlee,2016).

Inthecontextofsentimentanalysisortextmining,ann-gramisaneighbouringsequenceofnitemsfromagivensequenceoftextorspeech.NItemscanbeletters,words,pairsofwordsforexampleandaretypicallycollectedfromasequenceoftextorspeechcorpus.Figure3showshowasentencecanbebrokenupintovariousformatsi.e.words,pairsofwords,etc.N-gramsentimentanalysisconsidersthesentenceasawhole(Ghiassietal.,2013).

InMachinelearning,NaïveBayesistermedasafamilyof“probabilisticclassifiers”basedontheuseofBayesTheorem.NaïveBayesclassifiersarehighlyscalableandaremainlyusedwhenthesizeofthetrainingsetisless.TheconditionalprobabilitythataneventXoccursgiventheevidenceYisdeterminedbyBayesruleshowninEquation(1).UnlikeNaïveBayes,MaximumEntropyClassifierdoesnotassumethatthefeaturesareconditionallyindependentofeachother.Uncertaintyismaximumforauniformdistributionandthemeasureofuncertaintyisknownasentropy.MaximumEntropyClassifiercanbeused tosolveavarietyof textminingscenariosparticularlysentimentanalysis

Figure 2. Basic American Customer Satisfaction Index model (ACSI, 2018)

Page 7: The Application of Sentiment Analysis and Text Analytics ... · sentimentprovidesatruerreflectionofthecustomersexperiencewithaserviceorproduct. 2. RELATED WoRK Textanalyticsprovidesasetoftechniquestomaketrendsvisible,dynamic,andtorevealrelationships

International Journal of Data Warehousing and MiningVolume 15 • Issue 4 • October-December 2019

27

(Vryniotis,2013).MaximumEntropyClassifierusesaparameterizedsetofweightswhichwhencombined,jointhefeaturesthataregeneratedfromasetoffeaturesbyanencoding.Thisencodingthenmapseachpairoffeaturesetandlabelsthemtoavector(McCallumzy&Nigamy,1998).MEclassifiersareknownastheexponentialorlog-linearclassifiersandbecausetheyworkbyextractingsetsoffeaturesfromtheinputdata,combiningtheminalinearfashionandthenusingthissumastheexponent(DaumeIII&Marcu,2006).(Pang&Lee,2002)usedtheapproachthateachdocumentisrepresentedwithanarrayof1’sand0’sthatindicateifawordexistsinthetextofthedocument.ThisplaysontheBoW(BagofWords)theorycommonlyusedinNLPandtextmining.

P c xP x c P c

Posterior obability

Likelihood Class i

||

( ) = ( ) ( )↑

↓ ↓

Pr

Pr oor obability

edictor ior obability

P x

Pr

Pr Pr Pr↑

( )

P c x P x c P x c P x c P cn

| | | ... |( ) = ( )× ( )× × ( )× ( )1 2 (1)

TheK-NearestNeighbour(KNN)techniqueisbasedonthenotionthattheclassificationofaninstancewillbeinsomewaylikethosenearbyitinthevectorspacehencethetermnearestneighbour.(Srivastav&Singh,2014)researchedweightedk-NearestNeighbourwheretheyassumedweightagetothoseobjectsinthetrainingsetandthenusedtheseweightsfortheircalculationofsentimentoftextinwordbywordfashion.Inaweightedk-NNanalysis(Srivastav&Singh,2014)thesentencesofthetweetscapturedweretokenisedandthenstopwordswereremoved.Thealgorithmdevelopedapositivescoretoeachreviewafterthefirstparse.Forthesecondparsingprocessaneutralreviewisinput.ThisscoreisthenmodifiedifrequiredallowingforbetterpositivisedeterminationandcreatinganoutputfilethatconsistsofthereviewIDanditsassociatedpositivescore.Weightedcombinationsoffeaturesetscanbequiteeffectiveinthetaskofsentimentclassification,sincetheweightsoftheensemblerepresenttherelevanceofthedifferentfeaturesets(e.g.n-grams,POS,etc.)tosentimentclassification,insteadofassigningrelevancetoeachfeatureindividually(Xiaetal.,2011).(Balahur&Montoyo,2008) investigated thenotionof assigningproducts classes to textualdata sets andextractinggeneralfeaturesfromtheproductsandtheninturnassigningsentimenttothefeatures.

ARulebasedapproach(Liu,etal.,2005)toextractopinionfromatextualdatasetbydefiningvariousrulestokenizedeachsentenceineverydocumentofthedatasetandtestedforthepresenceofeachtokenorword.Ifthepresenceofthewordisconfirmedandhasapositivesentiment,thena

Figure 3. N-Gram example (Maiolo, 2015)

Page 8: The Application of Sentiment Analysis and Text Analytics ... · sentimentprovidesatruerreflectionofthecustomersexperiencewithaserviceorproduct. 2. RELATED WoRK Textanalyticsprovidesasetoftechniquestomaketrendsvisible,dynamic,andtorevealrelationships

International Journal of Data Warehousing and MiningVolume 15 • Issue 4 • October-December 2019

28

“+1”ratingisappliedtoit.Eachpieceoftextstartswithaneutralscoreofzeroandwasconsideredpositive.In(Devika,etal.,2016)forexample,iftheinputsentencecontainsanywordwhichisnotpresentinthedatabasethenitwillbeaddedtothedatabase.Thisisasupervisedlearningapproachinwhich thesystemis trainedto learn ifanynewinput isgiven.Rule-basedapproaches tendtouseheuristics todetermine sentiments and linguistics research to analyse sentiments.They tendtohaveverypoorgeneralization,butwithinanarrowdomaintheytendtoperformwell(Medhatetal.,2014).Inmachinelearningthisisknownasoverfittingandtheaimistoavoidthisproblemandconverselywithrules-basedsystemsit’sconsideredlargelyunavoidable.(Devikaetal.,2016)employedamachinelearningdatadrivenapproachwhichusedalabelledcorpusoftextswiththeirrespectivesentimentsforprediction.

ALexiconbasedapproachusesdictionariesofwordstaggedwiththeirsemanticpolarityandsentimentstrength(Yadav&Elchuri,2013)Thesescoresarethenusedtocalculatethepolarityand/orsentimentofthetextualdata.Itissaidthatthistechniqueallowsforhighprecisionbutlowrecall.LexiconBasedtechniquesworkonaninferencethattheaggregatedsemanticpolarityofasentenceortokensisthesumofpolaritiesoftheindividualphrasesorwords.AttheROMIP2012conferencethelexicon-basedmethodproposedin(Chetviorkin&Loukachevitch,2013)wasused.Theauthorsemployedanemotionalresearchtechniqueforsentimentanalysisdictionariesforeachofthedomains.Usingappraisedwordsfromtheappropriatetrainingset,thedomaindictionarieswerereloadedwiththehighestweightedwords

3. METHoDoLoGy

Webuiltaclassificationmodelusingpredictivevariables.Thestepswere:

1. Applyalinguistictextminingapproach(Ordenesetal.,2014)todeterminethecategoryofeachcustomersreviewbyminingthereviewverbatimthusdividingcustomerreviewsintothreegroupsofpositive,neutralornegative.Thissentimentanalysiswillfacilitatetheprocessofmodellingthesentimentscoregeneratedagainstthecustomerratingscore,henceattemptingtopredictthelikelihoodthatthecustomerhasratedtheproduct1thru5.

2. ApplyaNaiveBayesMachineLearningAlgorithmtopredicttheratingofthereviewsasthisisfoundtoworkwellwiththetextdata.

3. Preparethetextualdataforvectorizationbyremovingthepunctuationsandthenconvertingintolowercaseandallstopwordsremovedfromthesentences.

4. ConvertReview’ textdata toa token(whichhaspunctuationsandstopwordsremoved).UsingScikit-learn’sCount-Vectorizerfunctionalitythetextisthenconvertedintoamatrixoftokencounts.Iteffectivelyisa2Dmatrixwhereeachrowisauniqueword,andeachcolumnisareview.

5. DevelopthepredictionmodelusingaNaïveBayesalgorithmtopredictcustomersatisfactionscoresaccuracy.

6. EvaluatethemodelbytestingthepredictedvaluesagainsttheactualratingsinthedatausingaconfusionmatrixandclassificationreportfromtheScikit-learntoolkit.Thedataisrunthroughthemodelandtheefficiencyofthemodelisnotedandcommentedupon.

7. Testtheoutputfromthemodeltoconfirmthatpredictionsareaccurateandlineupwiththeexpectedresults.Thisinvolvestestingrandomreviewsandnotingthepredictedoutcomefromthemodelandcomparingwiththeactualratinginthedata.

3.1. Data TransformationThedatasetwassourcedfromKaggle1.ItcomprisedmobilephonereviewsfromtheAmazonwebsite(n=413,840)withonly62nothavingaverbatimcommentinthe‘Review’column.Thekeycomponent

Page 9: The Application of Sentiment Analysis and Text Analytics ... · sentimentprovidesatruerreflectionofthecustomersexperiencewithaserviceorproduct. 2. RELATED WoRK Textanalyticsprovidesasetoftechniquestomaketrendsvisible,dynamic,andtorevealrelationships

International Journal of Data Warehousing and MiningVolume 15 • Issue 4 • October-December 2019

29

welookedforwastherating(startratingor1-5ratingforexample)andacorrespondingreviewfromthecustomer.Wewantedawide-rangingvarianceinthetypesofreviewsleftbythecustomerssothatthereissomedegreeofsubjectivityasdifferingviewsareimportanttoaidthedesignanddevelopmentofthesentimentmodel.Therefore,thereisanexpectationtoseedifferentresultsfromthesentimentanalysisofthisdatasetcomparedtothatofdatasetsabouttechnologyproducts.ThevariableswereProductTitle,Brand,Price,Rating&Reviewtextandnumberofpeoplewhofoundthereviewhelpful.Pythonwasusedasithasaconsiderablevarietyoftextminingandsentimentanalysispackages(Bird,etal.,2009)andpackagesthatallowthevisualisationofthedataandithasasolidrangeofpredictiveanalyticspackages.

Transformingcustomersatisfactiondataintocustomerinsightsrequiresaformalsystembywhichmeasuresareincludedaspartofadatacollectionandanalyticsprocess.Theaimwastoinvestigatethecorrelationbetweenthesentimentanalysisscoreandthatofthecustomerratingscoreandpredictwithadegreeofcertaintythelikelihoodoftheratingbasedofthesentiment?Therefore,ratherthanrelyingonthesurvey-basedCSATmeasurement,weappliedbigdatatechniquestothemobilephonecustomerdatatoevaluatethelinkbetweensatisfactionandsentiment.Thebasisofthisanalysisistostudythecustomerfeedbackfromthe‘Review’columninthedatacontainingthetextualfeedback,tounderstandthesentimentbehindtheircommentaryandtoidentifykeythemesandwordswithinthisdata.TodothisthereareseveralstagesrequiredandthisfollowstheCross-IndustryProcessforDataMining(CRISP-DM)(Chapman,etal.,2000).TheCRISP-DMprocessisthe“defactostandardfordevelopingdataminingandknowledgediscoveryprojects”(Marbánetal.,2009).Thismulti-layeredresearchmethodologyhas5keycomponentsofwhichthisprocessutilizessomebutnotalltheelementsoftheframework:(Marbánetal.,2009).

CustomerswhopurchasedaphonefromAmazonwereinvitedtoratethephonei.e.CustomersSatisfactionscoreor‘Rating’asperthedata,provideareviewoftheproductviaafreeformtextboxi.e.‘Reviews’andaspertheamazonwebsiteotheruserscouldreacttothereviewand‘up-vote’thefeedbackgivenbythecustomerontheproduct.Thecustomerexperiencestrategyofcapturingdatalikethishasbeentoucheduponpreviouslybutitisrelevanttounderstandhowitisapplicableinabusinesscontext.Asabusinesstheremustbesomeunderstatingofhowcustomerreacttoproductsandinthisdatasettherearetwokeyvariablesthatprovideinsightintothisi.e.RatingandReviews.Wenotethatthisisanamazondatasetandnotcompanyspecificdatai.e.Appledonotownthisdataforexampleandwhilsttheycouldcarryoutanalysisonthedatadonothavesoleownershipofit.

Thedatasetcanbeconsideredlongitudinalcustomerdatacoveringattitudinaldataelicitedfromthecustomersurvey,whichincludesSatisfactionratings(‘Rating’)andqualitativecustomerverbatimcomments(‘Reviews’),behaviouraldata(transactionaldatai.e.price)andproductdataacrossmultipleproductsforallmajormobilephonecompanies.Thedatasourcedidnothaveanydate/timereferencesassociatedwiththerecords.Therefore,theopportunitytoconducttime-seriesanalysisisunavailable,thiswouldbeanareawheretheauthordeemsfutureworkwouldberecommended.Therewasatotalof413,840productreviewrecords.

Thekeypredictivevariablesinthedatasetsthatareconsideredforanalysiswere,RatingandReview. It shouldbenoted that therewillbedescriptiveanalysisperformedon theall theothervariableswithinthedatasettolookforadditionalinsightsfromthedata.Theintentionistoallowthebusinesstoaccuratelypredictthelikelihoodthatacustomerwillrateaproductthewaytheyhave.Theprocesswillinvolveidentifyingthosecustomerswhoratedtheproducts1thru5andthenusingapredictivemodellingtechniqueoutputtheaccuracyusingaclassificationreport.

3.2. AnalysisPearsonandSpearmancorrelationplotsascertainwhere therearecorrelationsbothpositiveandnegativeinthedataandthiswillthenhelpdecisionmakingeasierwhendecidingwhichvariablestolookataspartofthedescriptiveanalysisinlatersections.Datacorrelationshelptoidentifyrelatedvariablesandcanbeusedby investigating the relationshipbetween twoquantitative,continuous

Page 10: The Application of Sentiment Analysis and Text Analytics ... · sentimentprovidesatruerreflectionofthecustomersexperiencewithaserviceorproduct. 2. RELATED WoRK Textanalyticsprovidesasetoftechniquestomaketrendsvisible,dynamic,andtorevealrelationships

International Journal of Data Warehousing and MiningVolume 15 • Issue 4 • October-December 2019

30

variables.Pearson’scorrelationcoefficient(r)isameasureofthestrengthoftheassociationbetweenthetwovariablesandtheintervallevelrangesbetween-1to+1.WeuseSpearman’scorrelationwhichisthenonparametricversionofthePearsoncorrelation.Spearman’scorrelationcoefficient,(ρ,alsosignifiedbyrs)measures thestrengthanddirectionofassociationbetween tworankedvariables(Hauke&Kossowski,2011).

ThefirstpartoftheanalysiswasdoneusingthematplotliblibraryandacomparisonwasrunbetweentheReviewandReviewlengthvariablestogetanunderstandingiftherewasanysignificantcorrelationbetweenthelengthofareviewgivenbythecustomerandthesubsequentrating.Histogramsweregeneratedrepresentingeachreviewcategory i.e.1 ->5showing thedistributionof reviewlengthsforeachrating.Acomparisonwascarriedouttoanalysehowtheratingandpriceaffectedeachother.UsingtheSeabornlibrary,ascattergraphwitharegressionlinewascreatedforeachratinginrelationtothecorrespondingprice.Forthisanalysistheheatmapfunctionalityfromtheseabornlibrarywasusedandthisproducedaheatmapgraphicthatgroupedthenumericalvariablesinthedata(Price,ReviewVotesandReviewLength)andassignedacolourandcorrelationscoretoeachrelationshipindicatingthestrengthofthecorrelation.Thisisusefulinthesensethatitallowsthereadertospotanyimportantpositiveornegativecorrelationsbetweenthedatapointsthatmaynothavebeenobviouswheneyeballingtherawdata.Basedontheresultsonecouldthendecideonhowbesttocontinueanalysingthedata.

Ascatterchartfromtheseabornlibrarywasusedinthisanalysistolookatthedistributionofreviewlengthsacrosseachrating.Inthisscenariothedataisplottedusingascatterplotandthereasonforthisistoidentifyifthereareanyoutliersinthedataacrossthe5ratings.ToinvestigatetheRatingsdistributionofthedataaKDE(KernelDensityEstimation)plotwasusedtovisualisetheratings.Instatistics,KDEisanon-parametricwaytoestimatetheprobabilitydensityfunctionofarandomvariable,inthiscasetheRatinggivenbycustomers.KDEattemptstosmooththedatawhereassumptionsaboutthepopulationaremadebasedonafinitedatasetasshownbelowinEquation(2).

f̂ xn

K x xnh

Kx x

hh h ii

ni

i

n

( ) = −( ) = −

= =

∑ ∑1 1

1 1

(2)

whereKisthekernel,anon-negativefunctionthatintegratestooneandh>0isasmoothingparametercalledthebandwidthInEquation(2),thekernelKwithsubscripthiscalledthescaledkernel.Ideallyonewouldlooktoselectthehvalueassmallasisfeasibleforthedata;however,thereisalwaysatrade-offbetweenthebiasoftheestimatoranditsvariance.Thebandwidth(h)controlshowtightlytheestimationisfittedtothedata,verymuchlikethebinsizeinahistogram.TheKDEplotisverysimilartoahistogram,itestimatestheprobabilitydensityofacontinuousvariable,inthiscasetheprobabilityofaratingfrom1to5.Inthiscaseaunivariateorkerneldensityestimateisplotted.AcomparisonbetweenAppleandSamsungwasplottedintwoseparategraphsallowingfortheanalysisofthedistributionofratingsbetweenthetwobrands.Asidebysidecomparisoncanbevaluableindetermininganyobviousdifferencesbetweenthebrandswithfutureanalysisinmind.StayingwithKDEplotsandusingSamsungagainasanexample,aplotwascreatedtovisualisethedistributionbyPriceoftheSamsungproductsusingtheSeabornlibraryasintheprevioustwosubsections.

3.3. Sentiment AnalysisUsing theTextBlob library,weconducted sentiment analysisoneachcomment for eachdatasetusingTextBlobandassignedeach‘Review’apolarityscoreofbetween-1 ->1.Theranges forPositive,NegativeandNeutralwerePositiveSentimentScore:>0.1to1,NeutralSentimentScore:between-0.1and0.1andNegativeSentimentScore:-1to<-0.1.Theserangesrepresentthevariouscategoriesofsentimentpolarity.Adataframenamed‘df_polarity_desc’wasusedtooutputatable

Page 11: The Application of Sentiment Analysis and Text Analytics ... · sentimentprovidesatruerreflectionofthecustomersexperiencewithaserviceorproduct. 2. RELATED WoRK Textanalyticsprovidesasetoftechniquestomaketrendsvisible,dynamic,andtorevealrelationships

International Journal of Data Warehousing and MiningVolume 15 • Issue 4 • October-December 2019

31

withthecolumnsReview,SentimentandPolarity.Thisenabledthereadertoviewtheoutputfromthe sentiment analysis and to put some context around the actual Sentiment Polarity scores foreachindividualreview.Twoseparatewordcloudswerecreatedtovisualboth‘PositiveWords’and‘NegativeWords’.Both‘PositiveReviews’,‘NeutralReviews’and‘NegativeReviews’weredefinedbasedonthenumericalrangesdiscussed.ThisallowedtheauthortocategorisetheReviewsintothethree‘buckets’described.

UsingtheWordCloudfunction,twowordscloudswerevisualisedforbothPositiveandNegativereviewsi.e.themostusedwordsineachcategorisation.AwordcloudwithablackbackgroundwasproducedforthePositiveReviewwordsandaworldcloudwithawhitebackgroundforthenegativereviewwords.TheSentimentPolarityscorewasgeneratedona-1to1scalebasedonthecorresponding‘Review’.Theintentionistotestwhetherthesentimentpolarityscoreisrelatedtotheratingprovidedforthecorrespondingtextcommentfromthecustomer.ToachievethistheSentimentscorehasbeennormalisedfroma-1to1scaletoa1to5scaletomimictheratingsscaleavailabletothecustomer.Tonormalizethedata,Equation(3)wasusedandappliedtothe‘Polarity’scoreandthenormaliseddatawasnamed‘Polarity_Norm’.

xx x

x xnew=

−−min

max min

(3)

ThePolarityscoreswerenormalizedaspertheabovetoscoresbetween1and5toallowfordirectcomparisontothecorrespondingratingscoreOncethePolarityscorehasbeennormalised,themeanandstandarddeviationof‘Polarity_Norm’isusedtocreateseveralchartsanalysingtheprobabilitydistribution foreachRatingcategory.The twomain ratingcategories thathavebeenidentifiedthroughouttheresearchhavebeenRating1andRating5.Aplotofthepolaritydistributioniscreatedtoexplorethesentimentfromthereviewsintermsoftheactualratingandthisprovidesakeyinsightintothedata.

WordfrequenciesaregatheredfromtheReviewstextualdatasetintoadocument-termmatrixthusallowingtheextractionoffeaturesfromtheReviewssubdataset.Wecanemployafeatureextractiontechniquethatallowsfortransformingarbitrarydata,inthiscasetheReviewtext,intonumericalfeaturesusableformachinelearning.Thevectorizationprocessturnsthecollectionoftextdocumentsintonumerical featurevectors.This techniqueemploys tokenization,countingandnormalizationofthedataandiscommonlyreferredtoBagofWordsor“Bagofn-grams”representation.Using‘bow_transformer.vocabulary’ a classification technique is carried out on the data. Since thevectorizationprocessprovideddiscretefeaturesi.e.thewordcountsfromtheReviewdatathenextstepistoemployMultinomialNaïveBayeswiththefeaturecounts(Huang,2017).Thishelpsuscalculatetheprobabilityoftworatingscategories,Rating1andRating5,usingBayestheorem.UsingReviewsasthesecondvariableinthetesting,theprobabilityandaccuracyofpredictingaRatingof1and5basedonthereviewswillbeobtained.Todothisthedataissplitintoatestingandtrainingset.Wetake30%ofthedataandusethatasthetestingset.Usingaconfusionmatrix,wesummarisetheperformanceoftheclassificationalgorithm.Thenumberofcorrectandincorrectpredictionsaresummarizedwithcountvaluesandbrokendownbyeachclass.Weruntheconfusionmatrixfor1and5RatingsandthenallRatingsi.e.1thru5.Duetothenatureoftheconfusionmatrixitisexpectedthattheaccuracywilldropwhenthe5classesaremodelled.

4. EVALUATIoN

ThePearsonCorrelationshowedaslightcorrelationbetweenRatingandReviewVotes(seeFigure4).ThiswouldsuggestthatastheRatingincreasesthepriceincreasesandthenumberofReviewvotes

Page 12: The Application of Sentiment Analysis and Text Analytics ... · sentimentprovidesatruerreflectionofthecustomersexperiencewithaserviceorproduct. 2. RELATED WoRK Textanalyticsprovidesasetoftechniquestomaketrendsvisible,dynamic,andtorevealrelationships

International Journal of Data Warehousing and MiningVolume 15 • Issue 4 • October-December 2019

32

giventotheproductincreasesalso.Whichwouldmakesensebutsincethereisnotastrongcorrelationthe thought is that thiswouldbenotworthexploringwithfurtheranalysis.Using theSpearmancorrelationinFigure5,onecanobserveverysimilarcorrelationswithaslightnegativecorrelationbetweenRatingandReviewvoteswhich,aswiththePearsontest,doesnotwarrantdeeperanalysis.

ThecorrelationbetweenvariablesshowedanegativecorrelationbetweenPriceandPolaritywhichwouldsuggestthatthesentimentofthereviewtendstobemorenegativeasthepricegoesup.Thispotentiallycouldindicatethatcustomersarelesshappywiththeirmoreexpensivephonesandexpectbetterfromtheproduct.PolarityandReviewvoteshaveapositivecorrelationwitheachother,indicatingthatthemoreaReviewis‘up-voted’thehigherthemorepositivethesentimentseemstobe,whichwouldmakesenseformalogicalstandpoint(seeFigure6).

Wecreatedhistogramstocomparereviewlengthsagainsteachratingtohelpusnotethenumberofreviewsandwhichbucketsthereviewlengthbelongtoforeachrating.ThisisrevealsthatthereisanoverwhelmingnumberofreviewsforRating5andaconsiderablenumberofreviewsforRating1.ThiswillbeusefulgoingforwardindeterminingwhichRatingsshouldbemorecloselylookedinto.Figure7isquiteinterestingasisdisplaysthedistributionofpricepointsacrosseachRatingvalue.Asexpected,manypricepointsexistbelowthe$1000rangwithasmallnumberofoutliersupwardsof$2000.

Usingaheatmaptolookatcorrelationsbetweenthevariablesit isnoticeablethattherearenegativecorrelationsof-0.91betweenPriceandreviewlengthwhichwouldindicatethatthegreaterthepricetheshorterthereviewandviceversa.ThiscouldbeattributedtothefactthattherearemorereviewsatthelowerpricelevelasnotedpreviouslyinFigure8andthereforethisfindingitnotatallsurprising.Likeothercharts,itwasfeltthatacloserlookshouldbetakenatReviewlengthvRating.(seeFigure9).WhatisobservedhereisthedistributionofreviewsbyLengthvRatinganditshowsthatthereisaslighttendencyforcustomerstocommentmoreaboutphonesthattheyarehappywithhencethedistributioninFigure10.

Figure 4. Pearson Correlation

Page 13: The Application of Sentiment Analysis and Text Analytics ... · sentimentprovidesatruerreflectionofthecustomersexperiencewithaserviceorproduct. 2. RELATED WoRK Textanalyticsprovidesasetoftechniquestomaketrendsvisible,dynamic,andtorevealrelationships

International Journal of Data Warehousing and MiningVolume 15 • Issue 4 • October-December 2019

33

TogetasenseforthemultivariatedensityoftheratingsthechartinFigure11showsthehighdensityof%ratingsacrossthereviews.Thereisasmallerbutsignificantpopulationof1Starreviewsandthiswillsupplementthepreviouschartsforfurtherinvestigation.ComparingFigure11withFigure12there isnodistinctdifferencebetweenthe twodistributions indicatingthat there isnosignificantdifferencebetweenbrands.

4.1. Word FrequencyThe word phone is by far the most common. Some other interesting outputs from this are theoccurrencesofthewordsGreatandGoodwhichwouldleadtoapossibleconclusionthattheremaybeanoverwhelmingnumberofpositivereviewsinthedata.Figure13showstheTop7-words.

AsanothervisualalternativeFigure14showsawordcloudofwordfrequencies.Thisgivesanimpactfulvisualoftheoccurrencesofthetop100wordsinthereviewfeedback.

UsingtheTextBloblibrarytheintentionistogiveeach‘Review’asentimentscoreofbetween-1->1.Thesedefinethevariouscategoriesofsentiment.Thereisanoverwhelmingnumberofpositive

Figure 5. Spearman Correlation

Figure 6. Review v Review Length by Ratings

Page 14: The Application of Sentiment Analysis and Text Analytics ... · sentimentprovidesatruerreflectionofthecustomersexperiencewithaserviceorproduct. 2. RELATED WoRK Textanalyticsprovidesasetoftechniquestomaketrendsvisible,dynamic,andtorevealrelationships

International Journal of Data Warehousing and MiningVolume 15 • Issue 4 • October-December 2019

34

reviewsandthisfollowsfromtheexploratoryanalysisaboveshowingthehighfrequencyof5-starratings.Thisdoesnottellthewholestorysofurtheranalysisonthesentimentisrequired.

TheresultsinFigure15showthatsentimentanalysisresultsmostlyfallintothePositiveReviewcategorybasedon therangeused.NeutralReviewsseems tohavea reasonablevolumewith thenarrowrangetakenintoconsiderationandthereareasmallernumberofNegativereviewscompared

Figure 8. Heat Map

Figure 7. Regression Plot of Price v Rating

Page 15: The Application of Sentiment Analysis and Text Analytics ... · sentimentprovidesatruerreflectionofthecustomersexperiencewithaserviceorproduct. 2. RELATED WoRK Textanalyticsprovidesasetoftechniquestomaketrendsvisible,dynamic,andtorevealrelationships

International Journal of Data Warehousing and MiningVolume 15 • Issue 4 • October-December 2019

35

totheothercategories.Thiswouldsuggestthatmostcustomerswerereasonablyhappywiththeirpurchases.Interestingly,aRatingof1hadthesecondhighestvolumebutaspertheabovetherearesignificantlylessNegativereviewswhichonthefaceofitsuggeststhatthosewhogaveaRatingof1theircorrespondingsentimentscoremaybehigherandnotdirectlyrelatedtotherating,thisisfornowaquestiontobeexploredfurther.InterestinglytheobservabletrendforRatingscoresof5andPositivereviewsseemfairlyalignedsofurtheranalysisonthisisneeded.

Figure16showsasmallcohortofreviewswiththecorrespondingSentimentandPolarityscores.Thisgivesthereaderaveryquicktasteofoutputfromthecodeshowinghoweachpieceoftextisgivenascore.Averybriefscanoftheresultsshowsthatit‘seems’accurate,takingrecord[2]asapositivereviewi.e.‘VeryPleased’andconverselylookingatrecord[5]withthewordproblemsmentionedthesentimentis-0.3whichseemstofit.Thisisobviouslynotindicativeofthetotalpopulationof

Figure 9. Rating v Review Length Regression plot

Figure 10. Kernel Density Estimate

Page 16: The Application of Sentiment Analysis and Text Analytics ... · sentimentprovidesatruerreflectionofthecustomersexperiencewithaserviceorproduct. 2. RELATED WoRK Textanalyticsprovidesasetoftechniquestomaketrendsvisible,dynamic,andtorevealrelationships

International Journal of Data Warehousing and MiningVolume 15 • Issue 4 • October-December 2019

36

reviewsbutgivessomeinsightintotheresults.Figure17showsthewordcloudforPositiveReviewsandhereisitnoticeablethatkeywordssuchas‘Great’,‘Love’and‘Good’arequiteprevalentinthedataandthisissomethingthatwouldbeexpected.ReviewingFigure18,thewordcloudfornegativereviewsitisnoticeablethattherearemanyoccurrencesofthewords‘Disappointed’,‘Broken’and‘Bad’whichwouldtendtobeonthenegativesideandagainsomethingthatwouldbeexpectedtoseeintheanalysis.

ThemultinomialNaiveBayesclassifierissuitableforclassificationwithdiscretefeatures(e.g.,wordcountsfortextclassification).Themultinomialdistributionnormallyrequiresintegerfeaturecounts.However, inpractice, fractional counts suchas tf-idf alsowork.TheMultinomialNaïveBayesmodelhada93%accuracyratewhenpredictingaRatingof1anda96%accuracyratewhenpredictingaRatingof5,withanoverallmodelaccuracyof96%Thef1-scoreof90%and97%,with

Figure 12. Apple Kernel Density Estimate

Figure 11. Samsung Kernel Density Estimate

Page 17: The Application of Sentiment Analysis and Text Analytics ... · sentimentprovidesatruerreflectionofthecustomersexperiencewithaserviceorproduct. 2. RELATED WoRK Textanalyticsprovidesasetoftechniquestomaketrendsvisible,dynamic,andtorevealrelationships

International Journal of Data Warehousing and MiningVolume 15 • Issue 4 • October-December 2019

37

anoverallcoreof96%representtheweightedaverageoftheprecisionandrecallscoresindicatingthatthemodelisaccuratewhenattemptingtopredictifacustomerwillscorethereviewa1or5.

Themodelwastrainedontheentiredatasetincludingall5ratings.Figure19showstheprecisionandrecalloutputfromtheMultinomialNaïveBayesmodellingandFigure20showstheaccuracyandaverages.Itisnoticeablethatwhenthemodelisrunwiththeentirepopulationofthedatatheaccuracydropsquitesignificantlytoanoverallscoreof74%.ThisindicatesthatthedataforRatings

Figure 13. Word Frequency Chart

Figure 14. Word Cloud

Page 18: The Application of Sentiment Analysis and Text Analytics ... · sentimentprovidesatruerreflectionofthecustomersexperiencewithaserviceorproduct. 2. RELATED WoRK Textanalyticsprovidesasetoftechniquestomaketrendsvisible,dynamic,andtorevealrelationships

International Journal of Data Warehousing and MiningVolume 15 • Issue 4 • October-December 2019

38

Figure 16. Reviews, Sentiment and Polarity Table

Figure 15. Sentiment Frequency Chart

Page 19: The Application of Sentiment Analysis and Text Analytics ... · sentimentprovidesatruerreflectionofthecustomersexperiencewithaserviceorproduct. 2. RELATED WoRK Textanalyticsprovidesasetoftechniquestomaketrendsvisible,dynamic,andtorevealrelationships

International Journal of Data Warehousing and MiningVolume 15 • Issue 4 • October-December 2019

39

2,3&4hasasignificantimpactonthemodels’accuracy.ItisnoteworthythatRatings1and5remainquitehighandthisispotentiallyduetothelargevolumesofRatingsinthesecategoriescomparedwiththeothercategories.

4.2. Normalisation of PolarityUsing themeanandstandarddeviationof thePolarity_Norm,each ratingwasmatchedwithitscorrespondingnormalisedpolarityscore.Figure21isinterestingformanyreasonsasitcanbeobservedthatacustomer’sratingof5willhaveapproximatelya60%probability that thesentimentscorewillbe4.Secondlythereisanoticeabledistributionbetween3and4whichisinandaroundtheneutralsentimentcategory.Thisslightshiftawayfromapolarityscoreof5suggeststhatwhilemostofsentimentmaybepositivethereisasignificantcohortofreviewsthathaveaslightnegativeorneutraltonewhichwouldindicateanareaforfurtherexplorationfromabusinessperspective.

Figure 17. Positive Review Word Cloud

Figure 18. Negative Review Word Cloud

Page 20: The Application of Sentiment Analysis and Text Analytics ... · sentimentprovidesatruerreflectionofthecustomersexperiencewithaserviceorproduct. 2. RELATED WoRK Textanalyticsprovidesasetoftechniquestomaketrendsvisible,dynamic,andtorevealrelationships

International Journal of Data Warehousing and MiningVolume 15 • Issue 4 • October-December 2019

40

ThesamemethodwasappliedwhenplottingthegraphinFigure22.ConsideringthatthesearetheprobabilitydistributionsforRatingsof1,onecouldbeforgivenforthinkingtheseweretheresultsforaRatingof3.Itissignificanttoobservethatthepolarityorsentimentof1Ratingsaretendingtowardsslightlynegativeandnatural,thiscouldbedowntomanyreasons.Themodelitselfmaybeunabletoidentifynegativewords,orcustomerstendtonotbeoverlycriticalwhencommenting.

Figure23showsthedistributionsofRatings1,3,4&5togiveanideaofthehowthesentimenthasbeenmeasuredfortheseratings.DuetospaceconstraintsitwasdecidedthatRating2wouldbeleftoutanditalsohadthelowestvolumeofalltheratings.Rating3and4outputgraphsseemtobeconsistentandinlinewiththeexpectationsintermsofthedistributionofpolarity.Forthatreason,the suggestionwouldbe thatno further analysis isneeded for these scoresand fromabusinessperspectivethemostvaluableinsightscomefromtheRatings1and5.

Themodelwasintuitiveenoughtoestablishthatsentimentanalysisisakeyindicatorofthecustomerexperienceandforthatreasonshouldbeseriouslyconsideredaspartofanybusinessesmetrics strategy.Thekeyanalysis andultimately the findingsof this researchcentre around thenormalisationofthepolarityscorefromthesentimentanalysistoalignwiththecorrespondingratinggivenbythecustomerforthatproduct.Itwasnoticeablethatalargedistributionofratingsfellinto

Figure 20. Accuracy and Weighted Average Table

Figure 19. Confusion Matrix (additional variables)

Page 21: The Application of Sentiment Analysis and Text Analytics ... · sentimentprovidesatruerreflectionofthecustomersexperiencewithaserviceorproduct. 2. RELATED WoRK Textanalyticsprovidesasetoftechniquestomaketrendsvisible,dynamic,andtorevealrelationships

International Journal of Data Warehousing and MiningVolume 15 • Issue 4 • October-December 2019

41

the‘1’and‘5’buckets.Forthatreason,itwasprudenttoexaminetheassociatedpolaritywiththesescoresmorecloselythantheotherratingscores.Theanalysis,supplementedwiththevisualisationsshowedthatforaRatingof5,thesentimentdistributionwasskewedtowardsascoreof4withaslightemphasison3also.Thisshowedthatwhilecustomerswereratingtheproducts5/5theiractualsentimentregardingtheproductwasnotsooverwhelminglypositive.Converselythesamecanbe

Figure 21. Normalised Polarity Distribution for Rating 5

Figure 22. Normalised Polarity Distribution for Rating 1

Page 22: The Application of Sentiment Analysis and Text Analytics ... · sentimentprovidesatruerreflectionofthecustomersexperiencewithaserviceorproduct. 2. RELATED WoRK Textanalyticsprovidesasetoftechniquestomaketrendsvisible,dynamic,andtorevealrelationships

International Journal of Data Warehousing and MiningVolume 15 • Issue 4 • October-December 2019

42

saidofcustomersgivingtheproductsaratingof‘1’.Theresultsshowedthatwhilecustomersratedsomeproductspoorlytheoverallsentimentfellintothesomewhatnegativeandneutralbands.WhatweshowisthattherecanbeanoverrelianceonusingtheRatingsscoreasasinglepointoftruthforabusiness,whenbigdataanalyticstechniquesareemployeditispossibletodrilldownintotherealfeelingsofcustomersandgetaclearerpictureofproductsandservices.ThepredictiveanalyticsmodelusedmultinomialBayesiannetworkstopredictcustomerratingsusingthesentimentreviewscore.Theresultsoftheresearchhighlighttheshort-comingsoftheratingsscoreasasinglecustomerexperiencemetric,therebysupportingthenotionthatabusinesscannotrelysolelyonthismetric.Theresultfromthepredictionphaseisamodelthatiscapableofcorrectlypredicting74percentofcustomersratingsbasedontheirreviews.Thisisnotremarkablyhighpercentageintermsofaccuracyandbacksuptheclaimabovethattheratingscannotbeusedasanaccuratepredictorofsentimentalone.Onthisbasisoftheresults,thisresearchrecommendsthatbusinessesarediscouragedfromusingtheCSATorratingsscoresasasinglepointoftruthforcustomerexperience.Althoughthesemetricsareimportant,theydonotprovideaninsightintotheopinionofcustomersregardingtheexperiencetheyhavehadwithabusinessoraproduct.Theapproachemployedheredemonstratesthatadata-drivenapproachtoexploringcustomersentimentviatextmininghelpsformaclearerpictureforthebusinessanditisimportantthatfirmscapitalizeonthistypeofdata.

5. CoNCLUSIoN

QuantitativesurveyquestionssuchasCustomerSatisfactionwillalwaysremainwidelypopularwithbusinesseswhenattemptingtotakethepulsefromtheircustomersonproductsandcustomerexperience(Hague&Hague,2018).Tooverlyrelyonsinglecustomermetricssuchascustomerexperienceornetpromoterscoreisariskystrategyandthesuggestionisthatbusinessshouldfocusuponamorenuancedmultidimensionalapproachtopredictingcustomerbehaviour(Zakietal.,2016).

Figure 23. Rating 1,3,4 & 5 Polarity Distribution

Page 23: The Application of Sentiment Analysis and Text Analytics ... · sentimentprovidesatruerreflectionofthecustomersexperiencewithaserviceorproduct. 2. RELATED WoRK Textanalyticsprovidesasetoftechniquestomaketrendsvisible,dynamic,andtorevealrelationships

International Journal of Data Warehousing and MiningVolume 15 • Issue 4 • October-December 2019

43

Weexploredthenotionoffocussingandadoptingamorescientificapproachtomeasurecustomersentimentbyminingverbatimfeedbackandapplyingasentimentanalysismethodologytodeterminehowcustomersfeelabouttheproductsasopposedtothescoretheygavetheproductbasedontheconstrictive1to5rating.WeprovidedaframeworktounderstandhowacustomerexperienceprocesscanbedevelopedtoprovideinsightsintotheVoiceoftheCustomer.

Keyrecommendationsareasfollows:

• Establishingandmaintainingapositivecustomerexperienceforcustomersisimperativeforanycompany’slong-termgoalsandultimatelytheirsuccessasabusiness.

• Thereneedstobebuy-inataseniormanagementleveltoembraceacustomer-centricmetricsstrategywhichincludesdevelopmentofarobustprocesstomeasurecustomersentiment.

• Textminingandsentimentanalysisplayakeyroleinunderstandinghowcustomersfeelabouttheproductstheypurchaseortheexperiencetheyhadwithacompany(Fan,2014).

• Whilesurvey-basedmetricsgatheringwillalwaysremainanimportantprocessinmeasuringleading indicators of customers’ behaviour, this should be supplemented with a formalcustomerexperienceprograms thatmonitorperformanceandguide improvement efforts(Homburgetal.,2015).

• Organizationsshouldbegintotapintotherichveinofcustomerfeedbackdataattheirdisposalintheformoffreeverbatimtext.

ThecentralemphasisoftheoutcomessuggeststhatitwouldbefollyforabusinesstoignorecustomerfeedbackfromsurveysorsocialmediachannelsandsolelyrelyuponquantitativesuchascustomersatisfactionCSAT.ThereisnodoubtthatCSATisanextremelyimportantmetricandshouldcontinuetobemeasuredhoweverbusinessesneedtocapitaliseoncustomerdatapointssuchassentimenttoreallygetanunderstandingofhowcustomersarefeeling.

Descriptiveanalyticsplaysasignificantroleforabusinessintermsofunderstandingwhatishappeningwiththeirproductsandservices.Itgivesasnapshot,usuallyafterthefactofperformanceandexperienceandthisresearchshowedthatbyvisualisingthesedatapointsandreportingoutonthemcanbevaluabletoabusiness.Theresultsgiveaclearinsightintocustomeropinionandthenuancesaroundwhichcustomersthinkasopposedtohowtheyrateaproductonalimited5-pointscale.Thefindingspointtowardsdeeperlearningaroundsentimentandhowitcanbeutilisedinotherareasofthebusinessandwithadditionaldatapointsnotpresentinthisresearch.Thisstrategycouldbeemployedinabusinessscenarioandtheresultsandfindingswouldprovevaluableinhelpingbuildacustomerexperiencestrategyforabusiness.ThekeyfindingsarethatitisunwisetorelysolelyonCSATorNPSscoreswithoutexaminingcustomerfeedback.

Page 24: The Application of Sentiment Analysis and Text Analytics ... · sentimentprovidesatruerreflectionofthecustomersexperiencewithaserviceorproduct. 2. RELATED WoRK Textanalyticsprovidesasetoftechniquestomaketrendsvisible,dynamic,andtorevealrelationships

International Journal of Data Warehousing and MiningVolume 15 • Issue 4 • October-December 2019

44

REFERENCES

ACSI.(2018).American Customer Satisfaction Index.Retrievedfromhttp://www.theacsi.org/about-acsi/the-science-of-customer-satisfaction

Ajenstat,F.(2017).TableaufiveyearsaleaderinGartner’sMagicQuadrantforAnalytics.Tableau.Retrievedfromhttps://www.tableau.com/about/blog/2017/2/tableau-five-years-leader-gartners-magic-quadrant-analytics-66133

Allahyari,M.,Pouriyeh,S.,Assefi,M.,Safaei,S.,Trippe,E.D.,Gutierrez,J.B.,&Kochut,K.(2017).Abriefsurveyoftextmining:Classification,clusteringandextractiontechniques.arXiv:1707.02919

Arokiasamy,A.(2013).Theimpactofcustomersatisfactiononcustomerloyalty.Journal of Commerce,5(1),14–21.

Aswani,S.(2017).AnalyzingCustomerFeedbackData:ManualAnalysisvsNLP.Clarabridge.Retrievedfromhttps://www.clarabridge.com/blog/analyzing-customer-feedback-data-manual-analysis-vs-nlp/

Baars,H.,&Kemper,H.G.(2008).Managementsupportwithstructuredandunstructureddata—anintegratedbusinessintelligenceframework.Information Systems Management,25(2),132–148.

Balahur,A.&Montoyo,A.(2008).Afeaturedependentmethodforopinionminingandclassification.

LaValle,S.,Lesser,E.,Shockley,R.,Hopkins,M.S.,&Kruschwitz,N.(2011).BigData,AnalyticsandthePathFromInsightstoValue.MIT Sloan Management Review,52(2),21–32.

Bird,S.,Klein,E.,&Loper,E.(2009).NaturallanguageprocessingwithPython:analyzingtextwiththenaturallanguagetoolkit.InNatural language processing with Python: analyzing text with the natural language toolkit(pp.1–34).O’ReillyMediaInc.

Broan,S.E.&Steger,A.J.(2013).Keyperformanceindicatorsarenotjustaboutprofit.CFMA.Retrievedfromhttp://www.cfma.org/content.cfm?ItemNumber=1899

Brownlee, J. (2016). Support Vector Machines for Machine Learning. Retrieved from https://machinelearningmastery.com/support-vector-machines-for-machine-learning/

Cappelli,A.(2017).Natural Language Processing with Stanford CoreNLP.Retrievedfromhttps://cloudacademy.com/blog/natural-language-processing-stanford-corenlp-2/

Chapman,P.,Clinton,J.,Kerber,R.,Khabaza,T.,Reinartz,T.,Sherer,C.,&Wirth,R.(2000).Crossindustrystandardprocessfordatamining(CRISP-DM)1.0.

Chen,H.,Chiang,R.,&Storey,V.(2012).Businessintelligenceandanalytics:Frombigdatatobigimpact.Management Information Systems Quarterly,1(1),1165–1188.doi:10.2307/41703503

Chetviorkin,I.&Loukachevitch,N.(2013).EvaluatingsentimentanalysissystemsinRussian.

Chowdhury,G.(2005).Natural languageprocessing.Annual Review of Information Science & Technology,37(1),51–89.doi:10.1002/aris.1440370103

Clarabridge.(2015).Retrievedfromwww.clarabridge.com

CranProject.(2018).CRAN Task View: Natural Language Processing.Retrievedfromhttps://cran.r-project.org/web/views/NaturalLanguageProcessing.html

Daume,H.III,&Marcu,D.(2006).Domainadaptationforstatisticalclassifiers.Journal of Artificial Intelligence Research,26,101–126.doi:10.1613/jair.1872

DepartmentofIndustry,AustralianGovernment.(2018).Understandyourcustomers.[REMOVEDHYPERLINKFIELD]Retrievedfromhttps://www.business.gov.au/info/plan-and-start/start-your-business/what-is-customer-service/understand-your-customers

Devika,M.D.C,S.&Ganesha,A.,2016.SentimentAnalysis:AComparativeStudyOnDifferentApproaches.Chennai,TamilNadu,India,DepartmentofCSE,VidyaAcademyofScienceandTechnology.

EMC.(n.d.).[REMOVEDHYPERLINKFIELD]BigDatastorage[whitepaper].Retrievedfromhttps://www.emc.com/collateral/white-papers/idg-bigdata-storage-wp.pdf

Page 25: The Application of Sentiment Analysis and Text Analytics ... · sentimentprovidesatruerreflectionofthecustomersexperiencewithaserviceorproduct. 2. RELATED WoRK Textanalyticsprovidesasetoftechniquestomaketrendsvisible,dynamic,andtorevealrelationships

International Journal of Data Warehousing and MiningVolume 15 • Issue 4 • October-December 2019

45

Schumaker,R.P.,Zhang,Y.,Huang,C.N.,&Chen,H.(2012).Evaluatingsentimentinfinancialnewsarticles.Decision Support Systems,53(3),458–464.doi:10.1016/j.dss.2012.03.001

Fiedler,L.,Großmaß,T.,Roth,M.,&Vetvik,O.J.(2016).[REMOVEDHYPERLINKFIELD]Whycustomeranalyticsmatter.McKinsey.Retrievedfromhttps://www.mckinsey.com/business-functions/marketing-and-sales/our-insights/why-customer-analytics-matter

FoundationP.S.(2018).Getit.Python.Retrievedfromhttp://www.python.org/getit/

Ghiassi, M., Skinner, J., & Zimbra, D. (2013). Twitter brand sentiment analysis: A hybrid system usingn-gramanalysisanddynamicartificialneuralnetwork.Expert Systems with Applications,40(16),6266–6282.doi:10.1016/j.eswa.2013.05.057

ThinkwithGoogle.(2016).Whycutomeranalyticsarethekeytocreatingvalue.Retrievedfromhttps://www.thinkwithgoogle.com/intl/en-gb/marketing-resources/data-measurement/why-customer-analytics-are-the-key-to-creating-value/

Harris,D.(2018).WhatIsTextAnalytics?WeAnalyzetheJargon.Software advice.Retrievedfromhttps://www.softwareadvice.com/resources/what-is-text-analytics/

Hauke,J.,&Kossowski,T.(2011).ComparisonofvaluesofPearson’sandSpearman’scorrelationcoefficientsonthesamesetsofdata.Quaestiones Geographicae,30(2),87–93.doi:10.2478/v10117-011-0021-1

Hollyoake, M. (2009). The four pillars: Developing a ‘bonded’ business-to-business customer experience.Journal of Database Marketing & Customer Strategy Management,16(2),132–158.doi:10.1057/dbm.2009.14

Hosseini,S.,Maleki,A.,&Gholamian,M.(2010).ClusteranalysisusingdataminingapproachtodevelopCRMmethodologytoassessthecustomerloyalty.Expert Systems with Applications,37(7),5259–5264.doi:10.1016/j.eswa.2009.12.070

Huang,O.(2017).Applying Multinomial Naive Bayes to NLP Problems: A Practical Explanation.Medium.Retrievedfromhttps://medium.com/@Synced/applying-multinomial-naive-bayes-to-nlp-problems-a-practical-explanation-4f5271768ebf

TelusInternational.(2017).CustomerFirstMagazine.Retrievedfromhttps://www.telusinternational.com/media/TI-CustomersFirstMagazine-I04.pdf

Kaisler, S. (2013). Big Data: Issues and Challenges Moving Forward. In Proceedings of the 46th Hawaii International Conference on System Sciences(pp.995-1004).doi:10.1109/HICSS.2013.645

Kim,E.(2017).Everything You Wanted to Know about the Kernel Trick.Retrievedfromhttp://www.eric-kim.net/eric-kim-net/posts/1/kernel_trick.html

Liu,B.,Hu,M.,&Cheng,J.(2005,May).Opinionobserver:analyzingandcomparingopinionsontheweb.InProceedings of the 14th international conference on World Wide Web(pp.342-351).ACM.

Loria,S.(2018).TextBlob.Retrievedfromhttp://textblob.readthedocs.io/en/dev/

Maiolo,A.(2015).Comparingn-grammodels.Recognize Speech.Retrievedfromhttp://recognize-speech.com/language-model/n-gram-model/comparison

Marbán,Ó.,Mariscal,G.,&Segovia,J.(2009).Adatamining&knowledgediscoveryprocessmodel.InData mining and knowledge discovery in real life applications.IntechOpen.

Maritz.(2018).Maximiseverbatimswithtextanalytics[whitepaper].Retrievedfromhttps://www.maritzcx.com/blog/wp-content/uploads/2014/11/Maritz-White-Paper-Maximise-verbatims-with-text-analytics.pdf

McCallumzy, A. K., & Nigamy, K. (1998, July). Employing EM and pool-based active learning for textclassification.InProc. International Conference on Machine Learning (ICML)(pp.359-367).

Medhat,W.,Hassan,A.,&Korashy,H.(2014).Sentimentanalysisalgorithmsandapplications:Asurvey.Ain Shams Engineering Journal,5(4),1093–1113.doi:10.1016/j.asej.2014.04.011

Mejova,Y.A.(2012).Sentiment analysis within and across social media.UniversityofIowa.doi:10.17077/etd.zze4b252

Page 26: The Application of Sentiment Analysis and Text Analytics ... · sentimentprovidesatruerreflectionofthecustomersexperiencewithaserviceorproduct. 2. RELATED WoRK Textanalyticsprovidesasetoftechniquestomaketrendsvisible,dynamic,andtorevealrelationships

International Journal of Data Warehousing and MiningVolume 15 • Issue 4 • October-December 2019

46

Miner,G.(2012).Practical text mining and statistical analysis for non-structured text data applications.Oxford:AcademicPress.

Molag,T.(2017).PowerBIvsTableau.Encore Business.Retrievedfromhttps://www.encorebusiness.com/blog/power-bi-vs-tableau/

Natarajan,K.,Li,J.,&Koronios,A.(2010).Dataminingtechniquesfordatacleaning.InEngineering Asset Lifecycle Management(pp.796–804).SpringerLondon.

NTLK.org.(2017).Retrievedfromhttp://www.nltk.org/

Ordenes, F., Theodoulidis, B., Burton, J., Gruber, T., & Zaki, M. (2014). Analyzing customer experiencefeedback using text mining: A linguistics-based approach. Journal of Service Research, 17(3), 278–295.doi:10.1177/1094670514524625

Ott, T. (2018). What is text analytics. Software Advice. Retrieved from https://www.softwareadvice.com/resources/what-is-text-analytics/

Pang,B.,&Lee,L.(2002).Thumbsup?SentimentClassificationusing.Machine Learning.

Parasuraman,A.,Berry,L.L.,&Zeithaml,V.A.(1991).UnderstandingCustomerExpectationsofService.Sloan Management Review,32(3),39.

Project,C.(n.d.).Cran r-project.Retrievedfromhttps://cran.r-project.org/

Qualtrics.(2018).What is the Net Promoter Score (NPS).Retrievedfromhttps://www.qualtrics.com/experience-management/customer/net-promoter-score/

Rapidminer.(2018).Rapidminer.Retrievedfromhttps://rapidminer.com/

ŘehůřekR.(2018).gensim.Retrievedfromhttps://radimrehurek.com/gensim/about.html

SAS. (2017). Natural Language Processing. Retrieved from https://www.sas.com/en_us/insights/analytics/what-is-natural-language-processing-nlp.html

Smith,E.(2016).Everythingyouneedtoknowaboutcustomersuccess.Cobloom.Retrievedfromhttps://www.cobloom.com/blog/everything-you-need-to-know-about-customer-success

Spacy.io.(n.d.).Industrial-Strength NLP.Retrievedfromhttps://spacy.io/

Srivastava,A.,&Singh,D.M.(2014).SupervisedSAofproductreviewsusingWeightedk-NNAlgorithm.In2014 11th International Conference on Information Technology.

Srivastava,T. (2014).Supportvectormachinesimplified.Retrievedfromhttps://www.analyticsvidhya.com/blog/2014/10/support-vector-machine-simplified/

Tamilselvi,A.&ParveenTaj,M.(2013).SentimentAnalysisofMicroblogsusingOpinionMiningClassificationAlgorithm.International Journal of Science and Research,2(10),2319–7064.

Taylor,C.(2018).[REMOVEDHYPERLINKFIELD]Structuredvsunstructureddata.Datamation.Retrievedfromhttps://www.datamation.com/big-data/structured-vs-unstructured-data.html

Tirunillai,S.,&Tellis,G.(2014).Miningmarketingmeaningfromonlinechatter:Strategicbrandanalysisofbigdatausinglatentdirichletallocation.JMR, Journal of Marketing Research,51(4),463–479.doi:10.1509/jmr.12.0106

Upadhyay,P.(2017).RemovingstopwordsNLTKpython.Geeksforgeeks.com.Retrievedfromhttps://www.geeksforgeeks.org/removing-stop-words-nltk-python/

Verint.(2016).Verint Text Analytics.Retrievedfromhttps://www.verint.com/Assets/resources/resource-types/datasheets/text-analytics-datasheet.pdf

Vryniotis,V.(2013).MachineLearningTutorial:TheMaxEntropyTextClassifier.Datumbox.Retrievedfromhttp://blog.datumbox.com/machine-learning-tutorial-the-max-entropy-text-classifier/

Page 27: The Application of Sentiment Analysis and Text Analytics ... · sentimentprovidesatruerreflectionofthecustomersexperiencewithaserviceorproduct. 2. RELATED WoRK Textanalyticsprovidesasetoftechniquestomaketrendsvisible,dynamic,andtorevealrelationships

International Journal of Data Warehousing and MiningVolume 15 • Issue 4 • October-December 2019

47

Kevin Curran is a Professor of Cyber Security, Executive Co-Director of the Legal innovation Centre and group leader of the Ambient Intelligence & Virtual Worlds Research Group at Ulster University. He is also a senior member of the IEEE. Prof. Curran is perhaps most well-known for his work on location positioning within indoor environments, blockchain and Internet security evidenced by over 800 published works. His expertise has been acknowledged by invitations to present his work at international conferences, overseas universities and research laboratories. He is a regular contributor on TV and radio and in trade and consumer IT magazines.

Wichmann,M.(2017).The Python Wiki.Retrievedfromhttps://wiki.python.org/moin/

Xia, R., Zong, C., & Li, S. (2011). Ensemble of feature sets and classification algorithms for sentimentclassification.Information Sciences,181(6),1138–1152.doi:10.1016/j.ins.2010.11.023

Yadav,V.,&Elchuri,H.(2013).Serendio:SimpleandPracticallexiconbasedapproachtoSentimentAnalysis.InProceedings of theSecond Joint Conference on Lexical and Computational Semantics(pp.543-548).

ENDNoTE

1 https://www.kaggle.com