Classification of Code-Mixed Bilingual Phonetic Text Using Sentiment Analysis
-
Upload
shailendra_kumar_singh -
Category
Documents
-
view
12 -
download
0
description
Transcript of Classification of Code-Mixed Bilingual Phonetic Text Using Sentiment Analysis
DOI: 10.4018/IJSWIS.2021040104
International Journal on Semantic Web and Information SystemsVolume 17 • Issue 2 • April-June 2021
Copyright©2021,IGIGlobal.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIGIGlobalisprohibited.
59
Classification of Code-Mixed Bilingual Phonetic Text Using Sentiment AnalysisShailendra Kumar Singh, Computer Science and Engineering, Sant Longowal Institute of Engineering and Technology, India
https://orcid.org/0000-0001-9658-1441
Manoj Kumar Sachan, Computer Science and Engineering, Sant Longowal Institute of Engineering and Technology, India
ABSTRACT
Therapidgrowthofinternetfacilitieshasincreasedthecomments,posts,blogs,feedback,etc.,onalargescaleonsocialnetworkingsites.Thesesocialmediadataareavailableinanunstructuredform,whichincludesimages,text,andvideos.Theprocessingofthesedataisdifficult,butsomesentimentanalysis,informationretrieval,andrecommendersystemsareusedtoprocesstheseunstructureddata.Toextracttheopinionandsentimentofinternetusersfromtheirwrittensocialmediatext,asentimentanalysissystemisrequiredtodevelop,whichcanworkonbothmonolingualandbilingualphonetictext.Therefore,asentimentanalysis(SA)systemisdeveloped,whichperformswellondifferentdomaindatasets.Thesystemperformanceistestedonfourdifferentdatasetsandachievedbetteraccuracyof3%onsocialmediadatasets,1.5%onmoviereviews,1.35%onAmazonproductreviews,and4.56%onlargeAmazonproductreviewsthanthestate-of-arttechniques.Also,thestemmer(StemVerb)forverbsoftheEnglishlanguageisproposed,whichimprovestheSAsystem’sperformance.
KeyWoRdSCode-Mixed Phonetic Text, Opinion Verb, Sentiment Analysis, Sentiment Score, Social Media, Stemmer
1. INTRodUCTIoN
Withtheenhancementofinternetservicesandfacilities,socialnetworkingsitessuchasYouTube,GooglePlus,LinkedIn,Twitter,andFacebookhaveincreasedrapidly(PressTrustofIndia,2013).Thesesocialnetworkingsitesprovidefacilitiestosharetheusers’feelings,emotions,comments,feedbacks,and reviewsover the internet.Thus, the sizeof suchcontentover socialmedia (SM)increasesexponentiallydaybyday.MostoftheSMtextcontentsarewrittenusingmorethanonelanguageandcalledcode-mixedlanguage.ThetextoflanguagesotherthanEnglishiswrittenusingRomanscript’salphabetscalledphonetictext.ThephonetictextmixedwithEnglishlanguagetext,butthereisnofixedformatfortheseSMtexts(Duttaetal.,2015).Thesecontentsareusedasinputtexttoextractinformation,opinion,textsummarization,etc.,usingvariouslinguisticcomputations,naturallanguageprocessing,textmining,andinformationretrievalsystems(S.K.Singh&Sachan,2019b).
Opinionminingorsentimentanalysis(SA)isasub-fieldoftextminingandisoneofthemostrecentresearchtopicsofinterest(Pang&Lee,2008).TheSAisrelatedtopredictingandanalyzinghiddeninformation,emotion,andfeelingsfromthewrittentext.TheSAiswidelyusedtoanalyzefeedbacksongovernmentregulationandpolicyproposed,toanalyzethecustomers’likes/dislikes,to know the product demand, brand reputation, real-world event monitoring and analyzing of
International Journal on Semantic Web and Information SystemsVolume 17 • Issue 2 • April-June 2021
60
politicalpartydemand,competitorsproducts’meritordemeritanalyzes,andsubtaskcomponentofrecommendersystem(Bonadimanetal.,2017;D’Andreaetal.,2015).InApril2013,90%ofconsumersdecidedtopurchasethingsorservicesbasedononlinereviews(Pengetal.,2014).
Theuser-writtentextsareclassifiedusingSAintotwoorthreeclassesbasedondifferentformatssuchaspositive/negative/neutral, like/dislike,andgood/bad(Hopkenetal.,2017;S.K.Singh&Sachan,2019b).MostlytwoapproachesareusedtoclassifythetextusingSA,suchas(i)feature-basedand(ii)bag-of-words.Machinelearningtechniquesarebasedonfeatures,whilelexicon-basedtechniquesusebag-of-wordsapproaches.InSAofproductsandservices,machinelearningiswidelyused,butthebag-of-wordsareusedforsocialissues(Karamibekr&Ghorbani,2012).Machinelearningsystemsaretrainedonthelabeleddataset(s)andclassifythetestingdataset(s)basedonthetrainedsystem.Thelexicon-basedtechniquesarecategorizedastwoapproachessuchascorpus-basedanddictionary-based(S.K.Singh&Sachan,2019b).Thetextclassificationusingthedictionary-basedapproachdependsupontheopinionword’ssentimentscore.Thisdictionaryconsistsofopinionwordsandtheirsentimentscores(Alharbi&Alhalabi,2020).Theopinionwordsarenouns,verbs,adverbs,andadjectives,whichactasfeaturesindictionary-basedapproachasdiscussedinthearticles(Hopkenetal.,2017;Shamsudinetal.,2016;P.K.Singhetal.,2015;R.K.Singhetal.,2020;S.K.Singh&Sachan,2019b).TheseopinionwordsareusedtodevelopopiniondictionariessuchasSentiWordNet3.0(Baccianellaetal.,2010),thelatestSenticNet4(Cambriaetal.,2016),etc.
The SM text is available in monolingual, bilingual, and multilingual. The processing ofmultilingualtextsisadifficult taskascomparedtobilingualandmonolingualtext.Monolingualtextcanbeeasilyprocessed,butbilingualtextuptosomeextent.Taboadaetal.(2011)developed“SemanticOrientationCALculator(SO-CAL)”usingadictionary,whichincludesopinionwords(nouns,adverb,verbs,andadjectives)alongwiththeirpolarityandstrengthvalue(Taboadaetal.,2011)andachievedthebestperformanceintermoftheaccuracy70.10%onmoviedatasetamongotherdatasets.In2012,Karamibekr&Ghorbanidevelopedasentimentclassificationsysteminwhichverbsareconsideredthecoreelementofthesystemandcreatedanopinionverbdictionaryof440verbsand1726termsforthetermopiniondictionary.Thevalueofpolarityisassignedtoeachwordinthedictionaryrangingfrom+2to-2,andtheirsystemwastestedonadatasetrelatedtosocialissueswithanaccuracyof65%(Karamibekr&Ghorbani,2012).P.K.Singhetal.,(2015)usednegationhandlingrulesandadictionary-basedapproachtoclassifythesocialissuesrelateddatasetintopositiveornegativeclassandachievedanaccuracyof79.16%fornegativesentences.Iqbaletal.,(2015)proposedaBias-awarethresholding(BAT)methodwiththecombinationofAFINNandSentiStrengthtoreducethebiasinthelexicon-basedmethodforSAandobtained69%ofaccuracyusingNaïveBayesclassifier.
In2016,Bhargavaetal.developedasystemofSAforcode-mixedsentencesof theEnglishlanguagewith four Indian languages (Hindi,Telugu,Bengali, andTamil) using the count-basedapproachandlanguage-specificSentiWordNetstodeterminetheword’spolarityvalue.Theirsystemclassifiedsentencesbasedonthenumberofnegativeandpositivewordspresentwithinthesentenceandobtained54.4%ofFscoreonEnglish_Hindimixeddataset.Inthesameyear,alexical-basedmethodwasproposedby(Shamsudinetal.,2016)toclassifytheFacebookcommentswritteninMalaylanguageusingtheverb,adverb,andnegation.TheVerb+Negationcombinationachievedthehighestaccuracyof52.12%.
In2017,aSAsystemforaspecificdomaintextwasproposedby(Cruzetal.,2017),usingadictionary-basedapproach.Thesystemwastestedondifferentdatasetssuchaskitchen,book,moviereviews,andagriculture;andachievedthebestperformanceonmoviereviews(Fscoreof66%).Amulti-tasktri-trainingmodelbasedondeeplearningwasproposedby(Ruder&Plank,2018).Theyhavetrainedtheirmodelwiththreemodel-specificoutputsjointlyandachieved79.15%accuracyonAmazonproductreviews.Later,in2018Hanetal.proposedamethodtoreducethebiasinthelexicon-basedapproach.TheyusedSentiWordNetandtriedtoimproveaccuracybutachieved69.52%accuracyand69.65%Fscore(Hanetal.,2018).BaoxinWang(2018)proposedadeeplearning-based
International Journal on Semantic Web and Information SystemsVolume 17 • Issue 2 • April-June 2021
61
DisconnectedRecurrentNeuralNetworksmodel,andYu&Liu(2018)proposedSlicedRecurrentNeuralNetworksmodel.TheyhavetestedtheirmodelsonlargeAmazonproductreviews,buttheirsystemachievedlessthan65.00%accuracy.
SentiVerbsystemwasdevelopedby(S.K.Singh&Sachan,2019b)andtestedonGSTdatasetandmoviereviews.Thesystemconsideredverbasafeature,andobtained82.50%accuracyontheGSTdatasetand71.30%accuracyonmoviereviewsevenwithsmalldatasetavailability.In2020,aneventnamedSemEval-2020Task9wasconductedonSAofcode-mixedtweets,andthehighestperformancewasreportedas75.00%Fscorebyoneparticipantonbilingual(English_Hindi)textusingXLM-Rmethod(Patwaetal.,2020).In2020,amethodtoautomaticallylabelthedocumentwithpolarityorsentimentclasswasproposedandreducehumanintervention,processtime,andcost(Kansaletal.,2020).Theirpolaritydetectiontask,alongwiththesentimentclassificationsystem,achieved81.98%accuracyonAmazonproductreviewsusingalogisticregressionapproach.
It isobserved thatmostof theresearchworkhasbeenconductedonSA/opinionminingformonolinguallanguagetextinsteadofbilingualandmultilingualtextforresource-scarcelanguages(Loetal.,2017)asperTable1andrecentsurveyarticles(Drus&Khalid,2019;Guellil&Boukhalfa,2015;Hussein,2018;Medhatetal.,2014;Ravi&Ravi,2015;Serrano-Guerreroetal.,2015;R.K.Singhetal.,2020;Yueetal.,2019).ThehighestaccuracyonSMdatasetis82.5%;71.3%onmoviereviews(S.K.Singh&Sachan,2019b);81.98%onAmazonproductreviews(Kansaletal.,2020)and64.43%onlargeAmazonproductreviews.MostofthestudiesotherthanEnglishlanguageareinChinese,Japanese,German,Spanish,French,Italian,Swedish,Arabic,andRomanian(Loetal.,2017).Therefore,asmallernumberofstudiesandexperimentshavebeenconductedforbilingualandmultilingualtextandhavenotshownadequatesystemperformance.Thereisnoworkfoundrelatedtothesentimentclassificationofcode-mixedbilingual(EnglishandPunjabi)phonetictext.Themachinelearningtechniquesrequirealargeamountoflabeleddataset(s)fortrainingpurposes,andcode-mixedphonetictextdataset(s)isnotavailablesufficiently.Hence,thereisaneedtodesignanddevelopaSAsystem,whichcangivebetterperformanceforbothmonolingualandbilingualcode-mixedphonetictext.Withthisperspective,asystemisdesignedanddevelopedforSAofbilingual(EnglishandPunjabi)code-mixedphonetictextusingarule-basedclassifieranddictionary-basedapproach.Thefirsttimetheverbopiniondictionary(VOD)forPunjabiwordsisdevelopedwiththemotivationoftheopinionverbdictionary(OVD)oftheEnglishlanguage(S.K.Singh&Sachan,2019b).
Thesignificantcontributionsare:Firsttimebilingual(English&Punjabi)code-mixedphonetictextsareconsideredforSA.Inthisarticle,asentimentanalysissystemisproposedtoclassifycode-mixed bilingual (English_Punjabi) phonetic text using handcrafted rules, rule-based classifiers,anddictionaries.Thebilingual code-mixed testingdatasets (86,400 reviews) aregenerated frommonolingualtext.Thespellchecker,VOD,negationwords,positivewords,negativeprefixes,negationhandlingrules,stopwords,andStemVerbsystemisdevelopedandusedtoextractthewriter’ssentimentfromtheirwrittentext.TheVODofthePunjabiandEnglishlanguageisdeveloped,whichincludes653wordsofPunjabiand677Englishwords.Therootverblistof3582wordsandanirregularverblistof222wordsisdeveloped.TheStemVerbsystem(Sub-systemofSAsystem)isproposedtoextracttherootverbfromtheinflectedformoftheEnglishlanguageverbandobtain22.83%betteraccuracythantheexistingsnowballstemmersystem.TheproposedSAsystemachievedbetterperformancethanstate-of-artapproachesonallfourbenchmarkdatasets,formonolingualtext(betteraccuracyof3%onSMdataset,1.5%onmoviereviews,1.35%onAmazonproductreviews,and4.56%onlargeAmazonproductreviews).
Thisarticleisorganizedasfollows:Section1describestheintroductionofSAandtheideaaboutpreviousresearchworksonSA.Section2providesdetailedinformationabouttheproposedframeworkofthesystemandanexplanationofthedifferentphasesfortheimplementationofthesystem.Section3isrelatedto theperformanceevaluationof theproposedsystemonthedifferentdataset(s)andresultanalysisalongwiththecomparisonofperformancewithstate-of-the-arttechniques.Finally,abriefdetailedconclusionofthisarticleisprovidedalongwiththefuturescopeinthelastsection.
International Journal on Semantic Web and Information SystemsVolume 17 • Issue 2 • April-June 2021
62
2. PRoPoSed SySTeM FRAMeWoRK ANd IMPLeMeNTATIoN
The proposed system classifies the code-mixed phonetic text into positive or negative using adictionary-basedapproach.Thissystemconsideredverbwordsasafeature.Thissystemhasbeen
Table 1. Analysis of existing works related to SA
Authors Method Accuracy (%)
F Score (%)
Limitations
(Taboadaetal.,2011) Lexicon-based 70.10 ●Smalldatasize(5100)●OnlyforEnglishlanguagetext
(Karamibekr&Ghorbani,2012)
Dictionary-based 65.00
●Smalldatasize(1016)●Onlytestedonsocialissuedataset●OnlyforEnglishlanguagetext
(Iqbaletal.,2015) BATwithLexicon-based 69.00 ●SystemdesignedonlyforEnglishlanguage
text
(P.K.Singhetal.,2015)
Dictionary-based 79.00
●VerySmalldatasize(48)●Consideredonlynegativecommentsfortesting.●OnlyforEnglishlanguagetext
(Shamsudinetal.,2016)
Dictionary-based 52.12
●VerySmalldatasize(450Facebookcomments)●LimitedtoMalaylanguagetext●Notsufficientaccuracy
(Bhargavaetal.,2016) Sentiwordnet 54.40
●Smalldataset(637)●NotconsideredPunjabiandEnglishcode-mixedtext●Performancenotsufficient
(Cruzetal.,2017) Dictionary-based 66.00 ●Smalldataset(4183)
●OnlyforEnglishlanguagetext
(Ruder&Plank,2018) Multi-tasktri-training 79.15 ●OnlyforEnglishlanguagetext
(Hanetal.,2018) Dictionary-based 69.52 69.60
●Smalldataset(2000)●OnlyforEnglishlanguagetext●OnlyAmazonproductreviews
(BaoxinWang,2018)
DisconnectedRecurrentNeuralNetworks
64.43
●OnlyforEnglishlanguagetext●Notsufficientaccuracy
(Yu&Liu,2018)
SlicedRecurrentNeuralNetworks
61.65
●OnlyforEnglishlanguagetext●Notsufficientaccuracy
(S.K.Singh&Sachan,2019b)
Dictionary-based 82.50 87.04 ●Smalldataset(400)
●OnlyforEnglishlanguagetext
(S.K.Singh&Sachan,2019b)
Dictionary-based 71.30 71.39 ●Smalldataset(2000)
●OnlyforEnglishlanguagetext
(Patwaetal.,2020) XLM-R 75.00 ●ConsideredonlyHindiandEnglishcode-mixedtext
(Kansaletal.,2020) Logisticregression 81.98 ●Smalldataset(8000)
●OnlyforEnglishlanguagetext
International Journal on Semantic Web and Information SystemsVolume 17 • Issue 2 • April-June 2021
63
designed which can handle bilingual phonetic text and constituted by (a) lower case conversion(b)tokenization(c)spellchecker(d)part-of-speech(POS)tagging(e)stopwordselimination(f)stemming(g)sentimentcalculation(h)documentclassificationasshowninFigure1.
2.1. Testing datasets GenerationTheexistingfourdatasetsintheEnglishlanguageareusedtogeneratebilingual(English_Punjabi)code-mixedphonetic text datasets (shown inTable2)with thehelpof abilingualdictionaryofEnglishandPunjabiwords.ThePunjabiwordsarewritteninphoneticusingRomanscriptletters.ThetransliterationofPunjabiwordsintophoneticwordsisdoneusingtheGRTsystem(S.K.Singh&Sachan,2019a).TheGSTimplementationFacebookcommentsaretakenfromanarticle(S.K.Singh&Sachan,2019b),moviereviews(Pang&Lee,2004),Amazonproductreviews(Book,DVD,Electronics, Kitchen) (Blitzer et al., 2007) and large Amazon product reviews (He & McAuley,2016).Thesizeofthedatasetfortestingis172,800comments/reviews(86,400foreachEnglishandEnglish_Punjabilanguage).
2.2. Conversion in Lower Case and TokenizationTheproposedsystemusesvariousdictionariesinwhichdataarestoredinlowercase,sothetestingdatasetmustbeconvertedintolowercase.Thecommentsaresplitintosentences,andsentencesintowordsbecausetheproposedsystemworksatthewordlevel.Theimportanceoftokenization(splitting)isdiscussedinthearticle(P.K.Singhetal.,2015).
2.3. Spell Checker of Bilingual TextThetestingdatasetisgeneratedbyinsertingthePunjabi(Phonetic)wordswithintheEnglishlanguagesentence,andonlycorrectPunjabiwordswithoutspellingmistakesareused.So,thereisnoneedtocheckthespellingofPunjabiwords.ThemisspelledwordisdetectedandcorrectedusingtheexistingsystemforEnglishwords(S.K.Singh&Sachan,2019b).
Figure 1. Proposed system framework for code-mixed bilingual phonetic text
International Journal on Semantic Web and Information SystemsVolume 17 • Issue 2 • April-June 2021
64
2.4. PoS TaggingPython’sNLTKtoolisusedtotagPart-of-speech(POS)alongwitheachword.ThePOStaggingmakesiteasytoidentifytheverb,adverb,adjective(Turney,2002)fromthesentence.
2.5. elimination of Stop Words from the Bilingual TextThestopwordsarethosewordsthatareuselessforsomenaturallanguageprocessingtasks.Theusesofstopwordsdependupontaskswherethesewordswillbeused.Thesignificanceofstopwordsremovalandvariousmethodstoeliminatesuchwordsareexplainedinthearticle(Saifetal.,2014).Afterremovingthesewords,theremainingwordsarefurtherprocessed.Thereare217stopwordsinthedictionary,whichinclude134Englishstopwordsfromthearticle(S.K.Singh&Sachan,2019b)and83selectedPunjabistopwordsfromthearticle(Kaur&Saini,2016).
2.6. StemmingStemmingistheprocessoffindingtherootwordfromitsinflectionalform.Duringthestemming,generallythesuffixesandprefixesareremovedfromitsinflectionalformtogettherootword(Jivani,2011).Forexample-keeps, kept, keepingaretheinflectional(grammatical)formofthe‘keep’word.Here,therootword‘keep’isfoundusingtheremovalofsuffixesfromkeepsandkeepingwords.Thesizeofthedatabasebecomesbulkywhenallformsofawordarestored.Hence,thedatabasestoresonlyrootwords.Therefore,averbstemmersystemisdevelopedforEnglishlanguagetext,asshowninFigures2and3,inalgorithms1and2.ThestemmingisdoneonlyforverbsofEnglishlanguagewordsbecauseVODcontainsonlyopinionverbs.StemVerbalgorithm1isusedtoextracttherootverbwordfromadifferentformofaverb.ThePOStaggersareattachedwitheachword,andidentifiedallverbsforstemming.Therearethreesub-stemmingfunctionsbasedonverbforms,suchasextractionfrom(a)‘ing’,(b)“ies,es,s”and(c)pastandpastparticiple(ied,ed,en,d,n)verbform.Algorithm 1: StemVerbInput: Word along with POS tagOutput: Word (root/original word)Step 1: If POS tag are VBP, VBG, VBZ, VBN and VBD, then word is a verb Step 2: (a) if word’s last letters are ‘ing’ then call stemming function to find root word from ‘ing’ verb form (shown in Figure 2) (b) if word’s last letters are ‘ies’, ‘es’, ‘s’, then call stemming function to find root word from ‘ies’, ‘es’, ‘s’ verb form (shown in Figure 3) (c) if word’s last letters are ‘ied’, ‘ed’, ‘en’, ‘d’, ‘n’ or VBN/VBD POS tag then call stemming function to find root word from past and past participle verb form (algorithm 2)
Table 2. Testing datasets
Sr. No. Datasets Size Language
Existing Generated
1 GSTImplementation 400comments English EnglishandPunjabi
2 MovieReviews 2000reviews English EnglishandPunjabi
3 AmazonProductReviews 8000reviews English EnglishandPunjabi
4 LargeAmazonProductReviews 76,000reviews English EnglishandPunjabi
International Journal on Semantic Web and Information SystemsVolume 17 • Issue 2 • April-June 2021
65
Step 3: return word (root word/original word)Step 4: Otherwise, word in not verbStep 5: Return word.
Theprocesstoextractrootverbfrom‘ing’verbformisshowninFigure2andexplainedwithsuitableexamples.
Example-1:W=Word‘keeping’istheinflectedformof‘keep’.TheWcontains‘ing’charactersattheend,soremovethe‘ing’charactersfromWandnowcheckW(keep)existinrootverblist(RVL),yesthenreturnedW(keep)tocallingalgorithm1.Example-2:W=Word‘lying’istheinflectedformof‘lie’.TheWcontains‘ing’charactersattheend,soremovethe‘ing’charactersfromWandnowcheckW(ly)existinRVL;noWdoesnotexistinRVL.Thencheck‘y’isthelastcharacterofW(ly),yessoreplace‘y’with‘ie’,andnowWis‘lie’,againcheckinRVL,WexistsinRVL.Finally,returnedW(lie)tocallingalgorithm1.
Example-3:W=Word‘getting’istheinflectedformof‘get’.TheWcontains‘ing’charactersattheend,soremovethe‘ing’charactersfromWandnowcheckW(gett)existinRVL,noWdoesnotexistinRVL.Thencheck‘y’isthelastcharacterofW(gett),nosochecklasttwocharactersaresameandnotvowels,yesnowremoveonelastcharacterfromW(gett).Finally,Wis‘get’word,whichisreturnedtocallingalgorithm1.
Example-4:W=Word‘choosing’istheinflectedformof‘choose’.TheWcontains‘ing’charactersattheend,soremovethe‘ing’charactersfromWandnowcheckW(choos)existinRVL;noWdoesnotexistinRVL.Thencheck‘y’isthelastcharacterofW(choos),nosochecklasttwocharactersaresameandnotvowels,nonowadd‘e’attheendofW(choos).Finally,Wis‘choose’,check-inRVL,WexistsinRVL.Therefore,returnedW(choose)tothecallingalgorithm1.
Figure3showstheprocesstoextracttherootverbwordfrom“ies,es,s”verbformandexplainedwithsuitableexamples.
Example-5:W=word‘applies’isaninflectedwordof‘apply’.TheWcontains‘ies’charactersattheend.Therefore,‘ies’charactersarereplacedwith‘y’,andW(apply)ischeckedinRVL,yesWexistinRVL,sofinallyW(apply)isreturnedtocallingalgorithm1.
Example 6:W=word‘bashes’isaninflectedwordof‘bash’.TheWcontains‘es’charactersattheend.Thenremove‘es’fromtheendofW(bashes)andcheckedW(bash)inRVL;yesWexistinRVL.Therefore,finallyreturnedWtocallingalgorithm1.
Example 7:W=word‘awakes’isaninflectedwordof‘awake’.TheWcontains‘es’charactersattheend.Thenremove‘es’fromtheendofW(awakes)andcheckedW(awak)inRVL;noWdoesnotexistinRVL.Nowremove‘s’fromW(awakes)andcheckW(awake)inRVL;yesWexistinRVL.Finally,W(awake)isreturnedtocallingalgorithm1.
Algorithm 2: Stemming verb from past and past participle verb formAbbreviations Used: W=word, RVL= root verb list, WT
= target wordInput: Word along with POS taggerOutput: Word (Root/Original word)Step 1: If word’s POS tag (VBN/VBD) and exist in an irregular verb form list, then extract the root verb word from the irregular verb list. Step 2: If ‘ied’ last letter of W, then W
T=W minus ‘ied’
and add ‘y’ at end of WT
(a) If WT exist in RVL, then return W
T
(b) Otherwise return WStep 3: If ‘ed’/ ‘en’ last letter of W, then W
T= W minus
International Journal on Semantic Web and Information SystemsVolume 17 • Issue 2 • April-June 2021
66
Figure 2. Extraction of root verb from ‘ing’ verb form
Figure 3. Extraction of root verb from (ies, es, s) verb form
International Journal on Semantic Web and Information SystemsVolume 17 • Issue 2 • April-June 2021
67
‘ed’/ ‘en’ and check WT in RVL
(a) If WT exist in RVL, then return W
T
(b) If ‘d’/ ‘n’ last letter of W, then WT= W minus ‘d’/ ‘n’
and check WT in RVL
(i) If WT exist in RVL, then return W
T
(c) If last two letter of WT is same, then
(i) If ‘en’ is last letter of W, then remove one last letter from W
T
(1) If WT exist in RVL, then return W
T
(2) Otherwise, add ‘e’ at end of WT and return W
T
(ii) Otherwise, remove one last letter from WT
(1) If WT exist in RVL, then return W
T
(2) Otherwise, return WStep 4: Otherwise, return W
2.7. Sentiment CalculationInthissection,thesentimentscoreofeachsentenceiscalculated.Somenegationhandlingsrulesareproposedtodeterminethepolarityofsentencesorwords,whenevernegationwords,negativeprefixes,andpositivewordsoccurwithinthesentence.
2.7.1 Negation HandlingThesignificanceofnegationwordswasfirsttimediscussedby(Polanyi&Zaenen,2006).Thesixnegationhandlingrulesareproposedandareusedwhenthepolarityofwordschangesduetothepresence of negation words1 or negative prefixes in the sentence (P. K. Singh et al., 2015). Forexample-Idonotlikethismovie.Here‘not’isanegationwordduetowhichthepolarityofthesentencechangesfrompositivetonegative.Thresholdnegative ( )Tn valueisfixedforallnegationwords,whichisequalto-0.25discussedinthearticle(S.K.Singh&Sachan,2019b).Thenegativeprefixes(dis,de,mis,ir, il, in,un)orsuffixesarealsoactingasnegationwords,sothenegationhandlingrules(showninTable5)areusedtocalculatethesentimentscorewheneverthesenegationwordsandnegativeprefixesorsuffixesoccurwithinasentence(S.K.Singh&Sachan,2019b).All14negationwords,7negativeprefixesofEnglishlanguagediscussedinthearticle(S.K.Singh&Sachan,2019b)and8negationwords(showninTable3),4negativeprefixes(de,ka,da,a)ofPunjabilanguageareusedfortheimplementationoftheproposedsystem.
Asentence’ssentimentisexpressedbytheopinionwords(positivewords,negationwords,andverbs)presentwithinthesentence.Thepositivewordsaretheopinionwords,buttheyarenotopinionverbs.Forexample-Nicephoto.Here‘nice’wordisnotaverb,butstillexpressesthesentimentofthesentence.Therefore,thesekindsofopinionwordsareconsideredaspositivewords(S.K.Singh&Sachan,2019b).Thereareatotalof35positivewords,outofwhich20wordsofEnglisharetakenfromthearticle(S.K.Singh&Sachan,2019b)and15wordsofPunjabilanguage(showninTable4).Thesentimentscorevalueforthesepositivewordsisdenotedasthresholdpositive( )Tp ,whichisequalto0.125valueasdiscussedby(S.K.Singh&Sachan,2019b).
2.7.2 Calculation of Sentiment ScoreThe VOD consists of opinion verbs of both languages (English and Punjabi) and the sentimentscorevalue.Thevalueofthesentimentscoreisfrom-1to1.ThenegationandpositivewordsarenotincludedinthisVODbecausetheymaybeadverbs,adjectives,determiners,andprepositions.Thereare1330opinionverbs,outofwhich677Englishwordsand553wordsofPunjabilanguage.Thesentimentscoretothesewordsisassignedmanuallybasedontheirpolarityandwiththehelpof
International Journal on Semantic Web and Information SystemsVolume 17 • Issue 2 • April-June 2021
68
existingEnglishSentiWordNet3.0(Baccianellaetal.,2010).ThisVODisdomain-dependent,butitcanbefurtherusedinotherdomainsonlyafteraddingsomewordsfromthatdomain.
Theopinionverbsareconsideredasafeatureandareextractedfromthesubjectivesentences;andsearchedintotheVOD,ifthewordispresentinVOD,thenextractthevalueofsentimentscoreoftheopinionverbwhichisdenotedasWordscore .ThevalueofWordscore ispositive(Wordscore > 0 )thenthewordispositive,otherwisethevalueofWordscore isnegative(Wordscore < 0 ),thentheword
Table 3. Negation words of Punjabi language
Sr. No. Negation Word Sr. No. Negation Word
1 naahi 5 bekaar
2 nahiin 6 galat
3 viruddh 7 gair
4 bura 8 nakaaraatamak
Table 4. Positive words of Punjabi language
Sr. No. Positive Word Sr. No. Positive Word Sr. No. Positive Word
1 haan 6 bharosa 11 piaar
2 thiik 7 changa 12 sahaaita
3 dhannavaad 8 shaanadaar 13 kadar
4 khair 9 vadhiia 14 vadhaaiiaan
5 mahaan 10 sahii 15 sakaaraatamak
Table 5. Rules for Negation Handling
S.No. Opinion verb Negative Prefixes
Negation Words
Positive Word
Sentiment Score Positive/ negative
Sentiment score calculation
1 Positive Yes Yes No Positive Pos Pos Wordscore score score= +
2 Positive Yes No No Negative Neg Neg Wordscore score score= − ×1
3 Positive No Yes No Negative Neg Neg Wordscore score score= − ×1
4 No No Yes Yes Negative Neg Neg Tscore score p= − ×1
5 Verboutofthedictionary
No Yes No Negative Neg Neg Tscore score n= +
6 Negative No Yes No Positive Pos Pos Wordscore score score= − ×1
International Journal on Semantic Web and Information SystemsVolume 17 • Issue 2 • April-June 2021
69
isnegative.Thepositivesentimentscore Posscore( ) valueforallpositiveopinionwords(verb/term)iscalculatedusingEquation(1)andnegativesentimentscore( Negscore) valueforallnegativeopinionwords(verb/term)usingEquation(2)withinasentence.Thesentencesentimentscore �( )Sentscore iscalculatedusingEquation(3)inwhichthenormalizedvalueof Posscore and Negscore isadded.Tonormalizethe Posscore and Negscore , Posscore isdividedbythetotalpositiveopinionwords( p )andNegscore isdividedbythetotalnegativeopinionwords( n )respectivelyinEquation(3).Thedocument/commentsentimentscoreiscalculatedusingEquation(4),bytakingthesummationofthesentimentscorevalue �( )Sentscore .ofallthesentences,andthetotalnumberofsentencesisdenotedby‘s’inEquation(4).Thevalueofeachsentence’ssentimentscoreiscalculatedusingthe‘calculationofsentimentscore’algorithm3.
Pos Word if Wordscorei
p
score i score= >=∑0
0( ) (1)
Neg Word if Wordscorej
n
score j score= <=∑0
0( ) (2)
SentPosp
Negnscore
score score= + (3)
Doc Sentscorek
s
score k==∑0
( ) (4)
Algorithm: 3 Calculation of sentiment scoreNotations: W-> Word, VOD -> Verb Opinion Dictionary, NW-> Negation Words, NHR->Negation Handling Rules, NP-> Negative Prefixes, PWL-> Positive Word List Input: Sentence (text)Output: Sentence’s sentiment scoreStep1: Read word (W) by word from the sentence and repeat steps 2 to 5 Step2: If last W of a sentence is a NW, then add Tn to Negscore and add 1 to n, goto step 1 Step3: If W is found in VOD then extract Wordscore from VOD (a) if NW comes before W, then update PosScore or Negscore using NHR and add 1 to p/n, goto step 1 (b) otherwise, update PosScore or Negscore using Equation (1) or (2) and add 1 to p/n, goto step 1 Step4: If W is found in PWL, then
International Journal on Semantic Web and Information SystemsVolume 17 • Issue 2 • April-June 2021
70
(a) if NW comes before W, then update Negscore using NHR and add 1 to n, goto step 1 (b) otherwise, update PosScore by adding Tp and add 1 to p, goto step 1 Step5: If W does not find in VOD and PWL, then (a) if NW comes before W, then update Negscore NHR and add 1 to n, goto step 1 (b) if NP exist in W and root word found in VOD, then extract Wordscore from VOD (i) if NW comes before W, then update PosScore using NHR and add 1 to p, goto step 1 (ii) otherwise, update NegScore using NHR and add 1 to n, goto step 1 Step6: Sentscore is updated using Equation (3)Step7: Sentscore is return
2.8. document ClassificationThedocumentisclassifiedintotwoclasses(negativeorpositive)usingbinaryclassclassification.Thedocument’ssentimentscore(Docscore )iscalculatedusingEquation(4),andthedocumentisclassifiedusingarule-basedclassifierEquation(5).Thedocumentisclassifiedasapositiveclassifits’ Docscore valueisgreaterthanorequaltozero;otherwise,anegativeclassusingEquation(5).ThosedocumentswhoseDocscore valueisequaltozeroisconsideredpositiveduetothepresenceofpositivewordsinmostcasesinplaceofanopinionverb(S.K.Singh&Sachan,2019b).
DocumentPositive if DocNegative otherwise
score=≥
0 (5)
3. eXPeRIMeNTAL ReSULTS ANd dISCUSSIoN
TheproposedSAsystem’sperformanceisevaluatedforEnglishandEnglish_Punjabi(bilingual)textonfourdifferentdatasets(discussedinsection2.1).Theproposedsystemusedadictionary-basedapproachandvariousdictionariessuchasstopwords,negativeprefixes,negationwords,positivewordsofbothEnglishandPunjabilanguage.TheTn = −0 25. isthethresholdnegativevalueandTp = 0 125. is the thresholdpositivevaluewereused(S.K.Singh&Sachan,2019b).Also, theperformanceoftheproposedStemVerbsystemisdiscussedinthissection.
The system performance is measured using performance metrics such as recall, precision,accuracy,andFscore(Equation6to9).Its’somerelatedtermsaredefinedastruepositive(Tp ):positivetextsarepredictedaspositive;falsepositive( Fp ):negativetextsarepredictedaspositive;truenegative(Tn ):negativetextsarepredictedasnegativeandfalsenegative( Fn ):positivetextsarepredictedasnegative.
International Journal on Semantic Web and Information SystemsVolume 17 • Issue 2 • April-June 2021
71
RecallT
T Fp
p n
=+( )
(6)
PrecisionT
T Fp
p p
=+( )
(7)
AccuracyT T
T F F Tp n
p p n n
=+( )
+ + +( )
(8)
F score recall precisionrecall precision
� =× ×
+2 (9)
3.1 Performance of System evaluation on different datasetsTheperformanceof theproposedsystemisdiscussedin thissectiononfourdifferentdataset(s).Alldataset(s)sizeand its’domainaredifferent.Thesystem’sperformance isbetteron theGSTimplementationdataset thanotherdatasets (shown inFigures4 and5).Theperformanceof theproposedsystemisbetterintheEnglishthantheEnglish_Punjabilanguagedatasets.
Theproposedsystemisdesignedforcode-mixedbilingualphonetictext,butitsperformanceisevaluatedonbothmonolingual(English)andbilingual(English_Punjabi)text.AsperTable6,thesystemperformanceisbetterthantheexistingstate-of-the-artmethodsonalldatasets.Thesystemachievedbetterperformanceintermsofaccuracyintherangeof3%to33.38%onsocialmediadataset,1.5%to3.8%onmoviereviews,1.35%to13.81%onAmazonproductreviews,4.56%to7.34%onlargeAmazonproductreviewsformonolingualtext.Inthebilingualtext,systemperformanceisbetterthanotherexistingmethodsintermsofFscore(13.22%onsocialmediaand17.03%onmoviereviews).
3.2 Performance evaluation of StemVerb SystemAStemVerbsystemisproposedtoextracttherootverbfromtheinflectionalformofverbs.TheframeworkandworkingprocessoftheStemVerbsystem(discussedinsection2.6).Therootverblistof3582wordsandirregularverbformsalistof222wordsisgeneratedwithdifferentresources2,3,.Thereare3992wordsusedtotesttheStemVerbsystemandtheexistingtwoalgorithmsperformance,suchasSnowball4andLancaster5.TheStemVerbsystemistheonlyproposedfortheverboftheEnglishlanguage.TheperformanceofStemVerbisbetter thanSnowballandLancasterstemmer(showninFigure6).
TheperformanceofSAsystemevaluatedintermsofaccuracyforbilingualtextusingallthreestemmers,butSAsystemperformedbetterusingtheStemVerbsystemthantheothertwoexistingstemmers(showninFigure7).Also,theaccuracyoftheproposedSAsystemhasbeenimprovedafterthestemmingofverbsonbilingualdatasets(showninFigure8).TheVODcontainsonlyrootverbsalongwiththeirsentimentscores.Therefore,thestemmingofverbwordsimprovestheperformanceoftheproposedSAsystem.
International Journal on Semantic Web and Information SystemsVolume 17 • Issue 2 • April-June 2021
72
4. CoNCLUSIoN ANd FUTURe SCoPe
Thesentimentanalysissystemforbilingualcode-mixedphonetictext isdevelopedandtestedonfourdatasets,writteninbothmonolingualandbilingual.TheexistingmonolingualtextinEnglishis converted into bilingual (English and Punjabi) code-mixed phonetic text. The spell checker,VOD,negationwords,positivewords,negativeprefixes,negationhandlingrules,stopwords,andStemVerbsystemisdevelopedandused toextract thewriter’ssentimentfromtheirwritten text.ThedevelopedVODforEnglishandPunjabilanguage,contains1330verbopinionwordsofboth
Figure 4. Performance of system on the different dataset(s) (English)
Figure 5. Performance of system on the different datasets (English and Bilingual)
International Journal on Semantic Web and Information SystemsVolume 17 • Issue 2 • April-June 2021
73
Table 6. Experimental results are compared with state-of-the-art
Authors Method POS Dataset Language Accuracy (%)
F Score (%)
(Taboadaetal.,2011) Lexicon-based
Adjective,adverb,verb,noun
Moviereviews English 70.10
(Karamibekr&Ghorbani,2012)
Dictionary-based
Verb,adverb,adjective Socialmedia English 65.00
(Iqbaletal.,2015) BATwithLexicon-based Moviereviews English 69.00
(P.K.Singhetal.,2015)
Dictionary-based Verb Socialmedia English 79.00
(Shamsudinetal.,2016)
Dictionary-based
Verb,adverb,adjective Socialmedia Malay 52.12
(Bhargavaetal.,2016) SentiWordNet Moviereviews English_Hindi 54.40
(Cruzetal.,2017) Dictionary-based
Adjective,noun Moviereviews English 66.00
(Ruder&Plank,2018)
Multi-tasktri-training
Amazonproductreviews English 79.15
(Hanetal.,2018) Dictionary-based
Amazonproductreviews English 69.52 69.60
(BaoxinWang,2018)
DisconnectedRecurrentNeuralNetworks
LargeAmazonproductreviews English 64.43
(Yu&Liu,2018)
SlicedRecurrentNeuralNetworks
LargeAmazonproductreviews English 61.65
(S.K.Singh&Sachan,2019b)
Dictionary-based Verb Socialmedia English 82.50 87.04
(S.K.Singh&Sachan,2019b)
Dictionary-based Verb Moviereviews English 71.30 71.39
(Patwaetal.,2020) XLM-R Socialmedia English_Hindi 75.00
(Kansaletal.,2020) Logisticregression
Amazonproductreviews English 81.98
Proposed system Dictionary-based Verb
SocialmediaEnglish 85.50 89.18
English_Punjabi 84.25 88.22
MoviereviewsEnglish 72.80 73.12
English_Punjabi 71.00 71.43
Amazonproductreviews
English 83.33 83.27
English_Punjabi 80.87 81.20
LargeAmazonproductreviews
English 68.99 68.07
English_Punjabi 65.34 65.84
International Journal on Semantic Web and Information SystemsVolume 17 • Issue 2 • April-June 2021
74
EnglishandPunjabilanguage.Thelistofirregularverbsof222wordsandrootverblistof3582words isdeveloped to improve theSAsystem’sperformance.TheSAsystemclassifies text intonegativeorpositiveclassusingarule-basedclassifierandsentimentscorevalueextractedfromtheVOD.TheSAsystemobtainedanaccuracyof85.5%and84.25%ontheGSTdataset;72.8%and71%onthemoviereviews;83.33%and80.87%onAmazonproductreviews;68.99%and65.34%onlargeAmazonproductreviewsformonolingualandbilingualtextrespectively.Thissystemwillautomaticallyclassifycustomers’feedbackintopositiveandnegativeclass,andwillhelpcustomerstodecidebeforepurchasingaproduct;manufacturerscanenhancethequalityofproductandservices
Figure 6. Performance of stemmers (Snowball, Lancaster, and StemVerb)
Figure 7. Sentiment analysis system’s accuracy using all three stemmers
International Journal on Semantic Web and Information SystemsVolume 17 • Issue 2 • April-June 2021
75
basedonit,andcankeepaneyeontheircompetitors;knowthemoodofthepublicbeforeelectionandgovernmentpolicies.TheproposedSAsystemforbilingualtextcanbefurtherextendedtoothercode-mixedbilingualtextsbymodifyingsomecomponentseasily.ThedevelopedVODcanalsobeextendedforotherpart-of-speechandtootherlanguages.
Figure 8. Sentiment analysis system’s accuracy with and without stemming
International Journal on Semantic Web and Information SystemsVolume 17 • Issue 2 • April-June 2021
76
ReFeReNCeS
Alharbi,J.R.,&Alhalabi,W.S.(2020).HybridApproachforSentimentAnalysisofTwitterPostsUsingaDictionary-basedApproachandFuzzyLogicMethods.International Journal on Semantic Web and Information Systems,16(1),116–145.doi:10.4018/IJSWIS.2020010106
Baccianella,S.,Esuli,A.,&Sebastiani,F.(2010).SENTIWORDNET3.0:Anenhancedlexicalresourceforsentimentanalysisandopinionmining.7th International Conference on Language Resources and Evaluation,2200–2204.http://nmis.isti.cnr.it/sebastiani/Publications/LREC10.pdf
Wang,B.(2018).DisconnectedRecurrentNeuralNetworksforTextCategorization.56th Annual Meeting of the Association for Computational Linguistics,2311–2320.doi:10.18653/v1/P18-1215
Bhargava,R.,Sharma,Y.,&Sharma,S. (2016).Sentimentanalysis formixedscript Indicsentences.2016 International Conference on Advances in Computing, Communications and Informatics,524–529.doi:10.1109/ICACCI.2016.7732099
Blitzer, J., Dredze, M., & Pereira, F. (2007). Biographies, bollywood, boom-boxes and blenders: Domainadaptationforsentimentclassification.ACL 2007 - Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics,440–447.https://www.aclweb.org/anthology/P07-1056.pdf
Bonadiman,D.,Castellucci,G.,Favalli,A.,Romagnoli,R.,&Moschitti,A.(2017).NeuralSentimentAnalysisforaReal-WorldApplication.InProceedings of the Fourth Italian Conference on Computational Linguistics CLiC-it 2017(pp.42–47).AccademiaUniversityPress.doi:10.4000/books.aaccademia.2357
Cambria,E.,Poria,S.,Bajpai,R.,&Schuller,B.(2016).SenticNet4:Asemanticresourceforsentimentanalysisbasedonconceptualprimitives.COLING 2016 - 26th International Conference on Computational Linguistics, Proceedings of COLING 2016: Technical Papers,2666–2677.https://www.aclweb.org/anthology/C16-1251.pdf
Cruz,L.,Ochoa,J.,Roche,M.,&Poncelet,P.(2017).Dictionary-basedsentimentanalysisappliedtoaspecificdomain.InA.-S.H.Lossio-VenturaJ.(Ed.),CommunicationsinComputerandInformationScience:Vol.656CCIS(pp.57–68).Springer.doi:10.1007/978-3-319-55209-5_5
D’Andrea,A.,Ferri,F.,Grifoni,P.,&Guzzo,T.(2015).Approaches,ToolsandApplicationsforSentimentAnalysisImplementation.International Journal of Computers and Applications,125(3),26–33.doi:10.5120/ijca2015905866
Drus,Z.,&Khalid,H.(2019).SentimentAnalysisinSocialMediaandItsApplication:SystematicLiteratureReview.Procedia Computer Science,161,707–714.doi:10.1016/j.procs.2019.11.174
Dutta,S.,Saha,T.,Banerjee,S.,&Naskar,S.K.(2015).Textnormalizationincode-mixedsocialmediatext.2015 IEEE 2nd International Conference on Recent Trends in Information Systems, 378–382.doi:10.1109/ReTIS.2015.7232908
Guellil, I., & Boukhalfa, K. (2015). Social big data mining: A survey focused on opinion mining andsentimentsanalysis.2015 12th International Symposium on Programming and Systems,1–10.doi:10.1109/ISPS.2015.7244976
Han,H.,Zhang,Y.,Zhang,J.,Yang,J.,&Zou,X.(2018).Improvingtheperformanceoflexicon-basedreviewsentiment analysis method by reducing additional introduced sentiment bias. PLoS One, 13(8), e0202523.doi:10.1371/journal.pone.0202523PMID:30142154
He,R.,&McAuley,J.(2016).UpsandDowns:ModelingtheVisualEvolutionofFashionTrendswithOne-Class Collaborative Filtering. International World Wide Web Conference Committee (IW3C2), 507–517.doi:10.1145/2872427.2883037
Hopken,W.,Fuchs,M.,Menner,T.,&Lexhagen,M.(2017).SensingtheOnlineSocialSphereUsingaSentimentAnalyticalApproach.InZ.Xiang&D.R.Fesenmaier(Eds.),Analytics in Smart Tourism Design Concepts and Methods(pp.129–146).Springer.doi:10.1007/978-3-319-44263-1_8
Hussein,D.M.E.-D.M.(2018).Asurveyonsentimentanalysischallenges.Journal of King Saud University -.Engineering and Science,30(4),330–338.doi:10.1016/j.jksues.2016.04.002
International Journal on Semantic Web and Information SystemsVolume 17 • Issue 2 • April-June 2021
77
Iqbal,M.,Karim,A.,&Kamiran,F.(2015).Bias-awarelexicon-basedsentimentanalysis.Proceedings of the 30th Annual ACM Symposium on Applied Computing - SAC ’15,845–850.doi:10.1145/2695664.2695759
Jivani, A. G. (2011). A Comparative Study of Stemming Algorithms. International Journal of Computer Technology and Applications,2(6),1930–1938.https://pdfs.semanticscholar.org/1c0c/0fa35d4ff8a2f925eb955e48d655494bd167.pdf
Kansal,N.,Goel,L.,&Gupta,S.(2020).Cross-domain sentiment classification initiated with Polarity Detection Task.EAIEndorsedTransactionsonScalableInformationSystems.,doi:10.4108/eai.26-5-2020.165965
Karamibekr, M., & Ghorbani, A. A. (2012). Verb oriented sentiment classification. 2012 IEEE/WIC/ACM International Conference on Web Intelligence,327–331.doi:10.1109/WI-IAT.2012.122
Kaur,J.,&Saini,J.R.(2016).PunjabiStopWords:AGurmukhi,ShahmukhiandRomanScriptedChronicle.ACM Symposium on Women in Research 2016,32–37.doi:10.1145/2909067.2909073
Lo,S.L.,Cambria,E.,Chiong,R.,&Cornforth,D.(2017).Multilingualsentimentanalysis:Fromformaltoinformalandscarceresourcelanguages.Artificial Intelligence Review,48(4),499–527.doi:10.1007/s10462-016-9508-4
Medhat,W.,Hassan,A.,&Korashy,H.(2014).Sentimentanalysisalgorithmsandapplications:Asurvey.Ain Shams Engineering Journal,5(4),1093–1113.doi:10.1016/j.asej.2014.04.011
Pang,B.,&Lee,L.(2004).ASentimentalEducation:SentimentAnalysisUsingSubjectivitySummarizationBasedonMinimumCuts.Proceedings of the ACL.https://arxiv.org/abs/cs/0409058
Pang,B.,&Lee,L.(2008).OpinionMiningandSentimentAnalysis.Foundations and Trends® in Information Retrieval, 2(1–2),1–135.10.1561/1500000011
Patwa,P.,Aguilar,G.,Kar,S.,&Pandey,S.(2020).SemEval-2020 Task 9: Overview of Sentiment Analysis of Code-Mixed Tweets.https://arxiv.org/abs/2008.04277
Peng,L.,Cui,G.,Zhuang,M.,&Li,C.(2014).Whatdosellermanipulationsofonlineproductreviewsmeantoconsumers?InDigital Commons @ Lingnan University(HKIBS/WPS/070-1314).https://commons.ln.edu.hk/hkibswp/70
Polanyi,L.,&Zaenen,A.(2006).ContextualValenceShifters.InComputingAttitudeandAffectinText:TheoryandApplications(pp.1–10).Springer-Verlag.doi:10.1007/1-4020-4102-0_1
PressTrustOfIndia.(2013,July10).Indiatohavethehighestinternettrafficgrowthrate.Business Standard.https://www.business-standard.com/article/technology/india-to-have-the-highest-internet-traffic-growth-rate-113071000014_1.html
Ravi, K., & Ravi, V. (2015). A survey on opinion mining and sentiment analysis: Tasks, approaches andapplications.Knowledge-Based Systems,89,14–46.doi:10.1016/j.knosys.2015.06.015
Ruder,S.,&Plank,B.(2018).Strongbaselinesforneuralsemi-supervisedlearningunderdomainshift.ACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers), 1,1044–1054.doi:10.18653/v1/P18-1096
Saif,H.,Fernandez,M.,He,Y.,&Alani,H.(2014).Onstopwords,filteringanddatasparsityforsentimentanalysisoftwitter.9th International Conference on Language Resources and Evaluation,810–817.http://oro.open.ac.uk/id/eprint/40666
Serrano-Guerrero,J.,Olivas,J.A.,Romero,F.P.,&Herrera-Viedma,E.(2015).Sentimentanalysis:Areviewandcomparativeanalysisofwebservices.Information Sciences,311,18–38.doi:10.1016/j.ins.2015.03.040
Shamsudin,N.F.,Basiron,H.,&Sa’aya,Z. (2016).LexicalBasedSentimentAnalysis –Verb,Adverb&Negation.Journal of Telecommunication, Electronic and Computer Engineering, 8(2),161–166.https://journal.utem.edu.my/index.php/jtec/article/view/976/566
Singh,P.K.,Singh,S.K.,&Paul,S.(2015).Sentimentclassificationofsocialissuesusingcontextualvalenceshifters.International Journal of Engineering and Technology,7(4),1443–1452.http://www.enggjournals.com/ijet/docs/IJET15-07-04-335.pdf
International Journal on Semantic Web and Information SystemsVolume 17 • Issue 2 • April-June 2021
78
Shailendra Kumar Singh is currently a Research Scholar in the Department of Computer Science and Engineering at Sant Longowal Institute of Engineering and Technology (SLIET), Sangrur, Punjab, India. He obtained his B.Tech degree in Information Technology from UPTU (Lucknow) in 2012 and Master of Engineering in Software Engineering from Birla Institute of Technology (BIT), Mesra, Ranchi (India). He has qualified national level exams (GATE-13 & UGC-NET-2018, 2019). His research interests include handwriting recognition, sentiment analysis, natural language processing, data mining, human stress level detection and personality detection.
Manoj Kumar Sachan (PhD) is currently a Professor at Sant Longowal Institute of Engineering and Technology (SLIET), India. He is associated with the Department of Computer Science and Engineering. He did his B.Tech in Computer Science from Punjabi University, Patiala, India. He did M.E in Computer Science from Thapar Institute of Engineering & Technology, Patiala, and Ph.D from Punjab Technical University, Jalandhar, India. His research interests include handwriting recognition, stress detection, opinion mining, medical image processing, natural language processing, and data mining.
Singh,R.K.,Sachan,M.K.,&Patel,R.B.(2020).360degreeviewofcross-domainopinionclassification:Asurvey.Artificial Intelligence Review.Advanceonlinepublication.doi:10.1007/s10462-020-09884-9
Singh,S.K.,&Sachan,M.K.(2019a).GRT:Gurmukhi toRomanTransliterationSystemusingCharacterMappingandHandcraftedRules.International Journal of Innovative Technology and Exploring Engineering,8(9),2758–2763.doi:10.35940/ijitee.I8636.078919
Singh,S.K.,&Sachan,M.K.(2019b).SentiVerbsystem:Classificationofsocialmediatextusingsentimentanalysis.Multimedia Tools and Applications,78(22),32109–32136.doi:10.1007/s11042-019-07995-2
Taboada,M.,Brooke,J.,Tofiloski,M.,Voll,K.,&Stede,M.(2011).Lexicon-BasedMethodsforSentimentAnalysis.Computational Linguistics,37(2),267–307.doi:10.1162/COLI_a_00049
Turney, P. D. (2002). thumbs up or thumbs down? semantic orientation applied to unsupervisedclassification of reviews. 40th Annual Meeting on Association for Computational Linguistics, 417–424.doi:10.3115/1073083.1073153
Yu,Z.,&Liu,G.(2018).SlicedRecurrentNeuralNetworks.27th International Conference on Computational Linguistics,2953–2964.https://www.aclweb.org/anthology/C18-1250
Yue,L.,Chen,W.,Li,X.,Zuo,W.,&Yin,M.(2019).Asurveyofsentimentanalysisinsocialmedia.Knowledge and Information Systems,60(2),617–663.doi:10.1007/s10115-018-1236-4
eNdNoTeS
1 Negationwordsreversethepolarityofthewordorsentence,ifthesewordsappearbeforenegativeorpositivewordinthesentence.
2 https://www.worldclasslearning.com/english/five-verb-forms.html3 https://www.enchantedlearning.com/wordlist/verbs.shtml4 https://snowballstem.org/demo.html5 https://text-processing.com/demo/stem/