Classification of Code-Mixed Bilingual Phonetic Text Using Sentiment Analysis

20
DOI: 10.4018/IJSWIS.2021040104 International Journal on Semantic Web and Information Systems Volume 17 • Issue 2 • April-June 2021 Copyright©2021,IGIGlobal.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIGIGlobalisprohibited. 59 Classification of Code-Mixed Bilingual Phonetic Text Using Sentiment Analysis Shailendra Kumar Singh, Computer Science and Engineering, Sant Longowal Institute of Engineering and Technology, India https://orcid.org/0000-0001-9658-1441 Manoj Kumar Sachan, Computer Science and Engineering, Sant Longowal Institute of Engineering and Technology, India ABSTRACT Therapidgrowthofinternetfacilitieshasincreasedthecomments,posts,blogs,feedback,etc.,ona largescaleonsocialnetworkingsites.Thesesocialmediadataareavailableinanunstructuredform, whichincludesimages,text,andvideos.Theprocessingofthesedataisdifficult,butsomesentiment analysis,informationretrieval,andrecommendersystemsareusedtoprocesstheseunstructureddata. Toextracttheopinionandsentimentofinternetusersfromtheirwrittensocialmediatext,asentiment analysissystemisrequiredtodevelop,whichcanworkonbothmonolingualandbilingualphonetictext. Therefore,asentimentanalysis(SA)systemisdeveloped,whichperformswellondifferentdomain datasets.Thesystemperformanceistestedonfourdifferentdatasetsandachievedbetteraccuracyof 3%onsocialmediadatasets,1.5%onmoviereviews,1.35%onAmazonproductreviews,and4.56% onlargeAmazonproductreviewsthanthestate-of-arttechniques.Also,thestemmer(StemVerb)for verbsoftheEnglishlanguageisproposed,whichimprovestheSAsystem’sperformance. KeyWoRdS Code-Mixed Phonetic Text, Opinion Verb, Sentiment Analysis, Sentiment Score, Social Media, Stemmer 1. INTRodUCTIoN Withtheenhancementofinternetservicesandfacilities,socialnetworkingsitessuchasYouTube, GooglePlus,LinkedIn,Twitter,andFacebookhaveincreasedrapidly(PressTrustofIndia,2013). Thesesocialnetworkingsitesprovidefacilitiestosharetheusers’feelings,emotions,comments, feedbacks, and reviews over the internet. Thus, the size of such content over social media (SM) increasesexponentiallydaybyday.MostoftheSMtextcontentsarewrittenusingmorethanone languageandcalledcode-mixedlanguage.ThetextoflanguagesotherthanEnglishiswrittenusing Romanscript’salphabetscalledphonetictext.ThephonetictextmixedwithEnglishlanguagetext,but thereisnofixedformatfortheseSMtexts(Duttaetal.,2015).Thesecontentsareusedasinputtextto extractinformation,opinion,textsummarization,etc.,usingvariouslinguisticcomputations,natural languageprocessing,textmining,andinformationretrievalsystems(S.K.Singh&Sachan,2019b). Opinionminingorsentimentanalysis(SA)isasub-fieldoftextminingandisoneofthemost recentresearchtopicsofinterest(Pang&Lee,2008).TheSAisrelatedtopredictingandanalyzing hiddeninformation,emotion,andfeelingsfromthewrittentext.TheSAiswidelyusedtoanalyze feedbacksongovernmentregulationandpolicyproposed,toanalyzethecustomers’likes/dislikes, to know the product demand, brand reputation, real-world event monitoring and analyzing of

description

The rapid growth of internet facilities has increased the comments, posts, blogs, feedback, etc., on a large scale on social networking sites. These social media data are available in an unstructured form, which includes images, text, and videos. The processing of these data is difficult, but some sentiment analysis, information retrieval, and recommender systems are used to process these unstructured data. To extract the opinion and sentiment of internet users from their written social media text, a sentiment analysis system is required to develop, which can work on both monolingual and bilingual phonetic text. Therefore, a sentiment analysis (SA) system is developed, which performs well on different domain datasets. The system performance is tested on four different datasets and achieved better accuracy of 3% on social media datasets, 1.5% on movie reviews, 1.35% on Amazon product reviews, and 4.56% on large Amazon product reviews than the state-of-art techniques. Also, the stemmer (StemVerb) for verbs of the English language is proposed, which improves the SA system’s performance.

Transcript of Classification of Code-Mixed Bilingual Phonetic Text Using Sentiment Analysis

Page 1: Classification of Code-Mixed Bilingual Phonetic Text Using Sentiment Analysis

DOI: 10.4018/IJSWIS.2021040104

International Journal on Semantic Web and Information SystemsVolume 17 • Issue 2 • April-June 2021

Copyright©2021,IGIGlobal.CopyingordistributinginprintorelectronicformswithoutwrittenpermissionofIGIGlobalisprohibited.

59

Classification of Code-Mixed Bilingual Phonetic Text Using Sentiment AnalysisShailendra Kumar Singh, Computer Science and Engineering, Sant Longowal Institute of Engineering and Technology, India

https://orcid.org/0000-0001-9658-1441

Manoj Kumar Sachan, Computer Science and Engineering, Sant Longowal Institute of Engineering and Technology, India

ABSTRACT

Therapidgrowthofinternetfacilitieshasincreasedthecomments,posts,blogs,feedback,etc.,onalargescaleonsocialnetworkingsites.Thesesocialmediadataareavailableinanunstructuredform,whichincludesimages,text,andvideos.Theprocessingofthesedataisdifficult,butsomesentimentanalysis,informationretrieval,andrecommendersystemsareusedtoprocesstheseunstructureddata.Toextracttheopinionandsentimentofinternetusersfromtheirwrittensocialmediatext,asentimentanalysissystemisrequiredtodevelop,whichcanworkonbothmonolingualandbilingualphonetictext.Therefore,asentimentanalysis(SA)systemisdeveloped,whichperformswellondifferentdomaindatasets.Thesystemperformanceistestedonfourdifferentdatasetsandachievedbetteraccuracyof3%onsocialmediadatasets,1.5%onmoviereviews,1.35%onAmazonproductreviews,and4.56%onlargeAmazonproductreviewsthanthestate-of-arttechniques.Also,thestemmer(StemVerb)forverbsoftheEnglishlanguageisproposed,whichimprovestheSAsystem’sperformance.

KeyWoRdSCode-Mixed Phonetic Text, Opinion Verb, Sentiment Analysis, Sentiment Score, Social Media, Stemmer

1. INTRodUCTIoN

Withtheenhancementofinternetservicesandfacilities,socialnetworkingsitessuchasYouTube,GooglePlus,LinkedIn,Twitter,andFacebookhaveincreasedrapidly(PressTrustofIndia,2013).Thesesocialnetworkingsitesprovidefacilitiestosharetheusers’feelings,emotions,comments,feedbacks,and reviewsover the internet.Thus, the sizeof suchcontentover socialmedia (SM)increasesexponentiallydaybyday.MostoftheSMtextcontentsarewrittenusingmorethanonelanguageandcalledcode-mixedlanguage.ThetextoflanguagesotherthanEnglishiswrittenusingRomanscript’salphabetscalledphonetictext.ThephonetictextmixedwithEnglishlanguagetext,butthereisnofixedformatfortheseSMtexts(Duttaetal.,2015).Thesecontentsareusedasinputtexttoextractinformation,opinion,textsummarization,etc.,usingvariouslinguisticcomputations,naturallanguageprocessing,textmining,andinformationretrievalsystems(S.K.Singh&Sachan,2019b).

Opinionminingorsentimentanalysis(SA)isasub-fieldoftextminingandisoneofthemostrecentresearchtopicsofinterest(Pang&Lee,2008).TheSAisrelatedtopredictingandanalyzinghiddeninformation,emotion,andfeelingsfromthewrittentext.TheSAiswidelyusedtoanalyzefeedbacksongovernmentregulationandpolicyproposed,toanalyzethecustomers’likes/dislikes,to know the product demand, brand reputation, real-world event monitoring and analyzing of

Page 2: Classification of Code-Mixed Bilingual Phonetic Text Using Sentiment Analysis

International Journal on Semantic Web and Information SystemsVolume 17 • Issue 2 • April-June 2021

60

politicalpartydemand,competitorsproducts’meritordemeritanalyzes,andsubtaskcomponentofrecommendersystem(Bonadimanetal.,2017;D’Andreaetal.,2015).InApril2013,90%ofconsumersdecidedtopurchasethingsorservicesbasedononlinereviews(Pengetal.,2014).

Theuser-writtentextsareclassifiedusingSAintotwoorthreeclassesbasedondifferentformatssuchaspositive/negative/neutral, like/dislike,andgood/bad(Hopkenetal.,2017;S.K.Singh&Sachan,2019b).MostlytwoapproachesareusedtoclassifythetextusingSA,suchas(i)feature-basedand(ii)bag-of-words.Machinelearningtechniquesarebasedonfeatures,whilelexicon-basedtechniquesusebag-of-wordsapproaches.InSAofproductsandservices,machinelearningiswidelyused,butthebag-of-wordsareusedforsocialissues(Karamibekr&Ghorbani,2012).Machinelearningsystemsaretrainedonthelabeleddataset(s)andclassifythetestingdataset(s)basedonthetrainedsystem.Thelexicon-basedtechniquesarecategorizedastwoapproachessuchascorpus-basedanddictionary-based(S.K.Singh&Sachan,2019b).Thetextclassificationusingthedictionary-basedapproachdependsupontheopinionword’ssentimentscore.Thisdictionaryconsistsofopinionwordsandtheirsentimentscores(Alharbi&Alhalabi,2020).Theopinionwordsarenouns,verbs,adverbs,andadjectives,whichactasfeaturesindictionary-basedapproachasdiscussedinthearticles(Hopkenetal.,2017;Shamsudinetal.,2016;P.K.Singhetal.,2015;R.K.Singhetal.,2020;S.K.Singh&Sachan,2019b).TheseopinionwordsareusedtodevelopopiniondictionariessuchasSentiWordNet3.0(Baccianellaetal.,2010),thelatestSenticNet4(Cambriaetal.,2016),etc.

The SM text is available in monolingual, bilingual, and multilingual. The processing ofmultilingualtextsisadifficult taskascomparedtobilingualandmonolingualtext.Monolingualtextcanbeeasilyprocessed,butbilingualtextuptosomeextent.Taboadaetal.(2011)developed“SemanticOrientationCALculator(SO-CAL)”usingadictionary,whichincludesopinionwords(nouns,adverb,verbs,andadjectives)alongwiththeirpolarityandstrengthvalue(Taboadaetal.,2011)andachievedthebestperformanceintermoftheaccuracy70.10%onmoviedatasetamongotherdatasets.In2012,Karamibekr&Ghorbanidevelopedasentimentclassificationsysteminwhichverbsareconsideredthecoreelementofthesystemandcreatedanopinionverbdictionaryof440verbsand1726termsforthetermopiniondictionary.Thevalueofpolarityisassignedtoeachwordinthedictionaryrangingfrom+2to-2,andtheirsystemwastestedonadatasetrelatedtosocialissueswithanaccuracyof65%(Karamibekr&Ghorbani,2012).P.K.Singhetal.,(2015)usednegationhandlingrulesandadictionary-basedapproachtoclassifythesocialissuesrelateddatasetintopositiveornegativeclassandachievedanaccuracyof79.16%fornegativesentences.Iqbaletal.,(2015)proposedaBias-awarethresholding(BAT)methodwiththecombinationofAFINNandSentiStrengthtoreducethebiasinthelexicon-basedmethodforSAandobtained69%ofaccuracyusingNaïveBayesclassifier.

In2016,Bhargavaetal.developedasystemofSAforcode-mixedsentencesof theEnglishlanguagewith four Indian languages (Hindi,Telugu,Bengali, andTamil) using the count-basedapproachandlanguage-specificSentiWordNetstodeterminetheword’spolarityvalue.Theirsystemclassifiedsentencesbasedonthenumberofnegativeandpositivewordspresentwithinthesentenceandobtained54.4%ofFscoreonEnglish_Hindimixeddataset.Inthesameyear,alexical-basedmethodwasproposedby(Shamsudinetal.,2016)toclassifytheFacebookcommentswritteninMalaylanguageusingtheverb,adverb,andnegation.TheVerb+Negationcombinationachievedthehighestaccuracyof52.12%.

In2017,aSAsystemforaspecificdomaintextwasproposedby(Cruzetal.,2017),usingadictionary-basedapproach.Thesystemwastestedondifferentdatasetssuchaskitchen,book,moviereviews,andagriculture;andachievedthebestperformanceonmoviereviews(Fscoreof66%).Amulti-tasktri-trainingmodelbasedondeeplearningwasproposedby(Ruder&Plank,2018).Theyhavetrainedtheirmodelwiththreemodel-specificoutputsjointlyandachieved79.15%accuracyonAmazonproductreviews.Later,in2018Hanetal.proposedamethodtoreducethebiasinthelexicon-basedapproach.TheyusedSentiWordNetandtriedtoimproveaccuracybutachieved69.52%accuracyand69.65%Fscore(Hanetal.,2018).BaoxinWang(2018)proposedadeeplearning-based

Page 3: Classification of Code-Mixed Bilingual Phonetic Text Using Sentiment Analysis

International Journal on Semantic Web and Information SystemsVolume 17 • Issue 2 • April-June 2021

61

DisconnectedRecurrentNeuralNetworksmodel,andYu&Liu(2018)proposedSlicedRecurrentNeuralNetworksmodel.TheyhavetestedtheirmodelsonlargeAmazonproductreviews,buttheirsystemachievedlessthan65.00%accuracy.

SentiVerbsystemwasdevelopedby(S.K.Singh&Sachan,2019b)andtestedonGSTdatasetandmoviereviews.Thesystemconsideredverbasafeature,andobtained82.50%accuracyontheGSTdatasetand71.30%accuracyonmoviereviewsevenwithsmalldatasetavailability.In2020,aneventnamedSemEval-2020Task9wasconductedonSAofcode-mixedtweets,andthehighestperformancewasreportedas75.00%Fscorebyoneparticipantonbilingual(English_Hindi)textusingXLM-Rmethod(Patwaetal.,2020).In2020,amethodtoautomaticallylabelthedocumentwithpolarityorsentimentclasswasproposedandreducehumanintervention,processtime,andcost(Kansaletal.,2020).Theirpolaritydetectiontask,alongwiththesentimentclassificationsystem,achieved81.98%accuracyonAmazonproductreviewsusingalogisticregressionapproach.

It isobserved thatmostof theresearchworkhasbeenconductedonSA/opinionminingformonolinguallanguagetextinsteadofbilingualandmultilingualtextforresource-scarcelanguages(Loetal.,2017)asperTable1andrecentsurveyarticles(Drus&Khalid,2019;Guellil&Boukhalfa,2015;Hussein,2018;Medhatetal.,2014;Ravi&Ravi,2015;Serrano-Guerreroetal.,2015;R.K.Singhetal.,2020;Yueetal.,2019).ThehighestaccuracyonSMdatasetis82.5%;71.3%onmoviereviews(S.K.Singh&Sachan,2019b);81.98%onAmazonproductreviews(Kansaletal.,2020)and64.43%onlargeAmazonproductreviews.MostofthestudiesotherthanEnglishlanguageareinChinese,Japanese,German,Spanish,French,Italian,Swedish,Arabic,andRomanian(Loetal.,2017).Therefore,asmallernumberofstudiesandexperimentshavebeenconductedforbilingualandmultilingualtextandhavenotshownadequatesystemperformance.Thereisnoworkfoundrelatedtothesentimentclassificationofcode-mixedbilingual(EnglishandPunjabi)phonetictext.Themachinelearningtechniquesrequirealargeamountoflabeleddataset(s)fortrainingpurposes,andcode-mixedphonetictextdataset(s)isnotavailablesufficiently.Hence,thereisaneedtodesignanddevelopaSAsystem,whichcangivebetterperformanceforbothmonolingualandbilingualcode-mixedphonetictext.Withthisperspective,asystemisdesignedanddevelopedforSAofbilingual(EnglishandPunjabi)code-mixedphonetictextusingarule-basedclassifieranddictionary-basedapproach.Thefirsttimetheverbopiniondictionary(VOD)forPunjabiwordsisdevelopedwiththemotivationoftheopinionverbdictionary(OVD)oftheEnglishlanguage(S.K.Singh&Sachan,2019b).

Thesignificantcontributionsare:Firsttimebilingual(English&Punjabi)code-mixedphonetictextsareconsideredforSA.Inthisarticle,asentimentanalysissystemisproposedtoclassifycode-mixed bilingual (English_Punjabi) phonetic text using handcrafted rules, rule-based classifiers,anddictionaries.Thebilingual code-mixed testingdatasets (86,400 reviews) aregenerated frommonolingualtext.Thespellchecker,VOD,negationwords,positivewords,negativeprefixes,negationhandlingrules,stopwords,andStemVerbsystemisdevelopedandusedtoextractthewriter’ssentimentfromtheirwrittentext.TheVODofthePunjabiandEnglishlanguageisdeveloped,whichincludes653wordsofPunjabiand677Englishwords.Therootverblistof3582wordsandanirregularverblistof222wordsisdeveloped.TheStemVerbsystem(Sub-systemofSAsystem)isproposedtoextracttherootverbfromtheinflectedformoftheEnglishlanguageverbandobtain22.83%betteraccuracythantheexistingsnowballstemmersystem.TheproposedSAsystemachievedbetterperformancethanstate-of-artapproachesonallfourbenchmarkdatasets,formonolingualtext(betteraccuracyof3%onSMdataset,1.5%onmoviereviews,1.35%onAmazonproductreviews,and4.56%onlargeAmazonproductreviews).

Thisarticleisorganizedasfollows:Section1describestheintroductionofSAandtheideaaboutpreviousresearchworksonSA.Section2providesdetailedinformationabouttheproposedframeworkofthesystemandanexplanationofthedifferentphasesfortheimplementationofthesystem.Section3isrelatedto theperformanceevaluationof theproposedsystemonthedifferentdataset(s)andresultanalysisalongwiththecomparisonofperformancewithstate-of-the-arttechniques.Finally,abriefdetailedconclusionofthisarticleisprovidedalongwiththefuturescopeinthelastsection.

Page 4: Classification of Code-Mixed Bilingual Phonetic Text Using Sentiment Analysis

International Journal on Semantic Web and Information SystemsVolume 17 • Issue 2 • April-June 2021

62

2. PRoPoSed SySTeM FRAMeWoRK ANd IMPLeMeNTATIoN

The proposed system classifies the code-mixed phonetic text into positive or negative using adictionary-basedapproach.Thissystemconsideredverbwordsasafeature.Thissystemhasbeen

Table 1. Analysis of existing works related to SA

Authors Method Accuracy (%)

F Score (%)

Limitations

(Taboadaetal.,2011) Lexicon-based 70.10 ●Smalldatasize(5100)●OnlyforEnglishlanguagetext

(Karamibekr&Ghorbani,2012)

Dictionary-based 65.00

●Smalldatasize(1016)●Onlytestedonsocialissuedataset●OnlyforEnglishlanguagetext

(Iqbaletal.,2015) BATwithLexicon-based 69.00 ●SystemdesignedonlyforEnglishlanguage

text

(P.K.Singhetal.,2015)

Dictionary-based 79.00

●VerySmalldatasize(48)●Consideredonlynegativecommentsfortesting.●OnlyforEnglishlanguagetext

(Shamsudinetal.,2016)

Dictionary-based 52.12

●VerySmalldatasize(450Facebookcomments)●LimitedtoMalaylanguagetext●Notsufficientaccuracy

(Bhargavaetal.,2016) Sentiwordnet 54.40

●Smalldataset(637)●NotconsideredPunjabiandEnglishcode-mixedtext●Performancenotsufficient

(Cruzetal.,2017) Dictionary-based 66.00 ●Smalldataset(4183)

●OnlyforEnglishlanguagetext

(Ruder&Plank,2018) Multi-tasktri-training 79.15 ●OnlyforEnglishlanguagetext

(Hanetal.,2018) Dictionary-based 69.52 69.60

●Smalldataset(2000)●OnlyforEnglishlanguagetext●OnlyAmazonproductreviews

(BaoxinWang,2018)

DisconnectedRecurrentNeuralNetworks

64.43

●OnlyforEnglishlanguagetext●Notsufficientaccuracy

(Yu&Liu,2018)

SlicedRecurrentNeuralNetworks

61.65

●OnlyforEnglishlanguagetext●Notsufficientaccuracy

(S.K.Singh&Sachan,2019b)

Dictionary-based 82.50 87.04 ●Smalldataset(400)

●OnlyforEnglishlanguagetext

(S.K.Singh&Sachan,2019b)

Dictionary-based 71.30 71.39 ●Smalldataset(2000)

●OnlyforEnglishlanguagetext

(Patwaetal.,2020) XLM-R 75.00 ●ConsideredonlyHindiandEnglishcode-mixedtext

(Kansaletal.,2020) Logisticregression 81.98 ●Smalldataset(8000)

●OnlyforEnglishlanguagetext

Page 5: Classification of Code-Mixed Bilingual Phonetic Text Using Sentiment Analysis

International Journal on Semantic Web and Information SystemsVolume 17 • Issue 2 • April-June 2021

63

designed which can handle bilingual phonetic text and constituted by (a) lower case conversion(b)tokenization(c)spellchecker(d)part-of-speech(POS)tagging(e)stopwordselimination(f)stemming(g)sentimentcalculation(h)documentclassificationasshowninFigure1.

2.1. Testing datasets GenerationTheexistingfourdatasetsintheEnglishlanguageareusedtogeneratebilingual(English_Punjabi)code-mixedphonetic text datasets (shown inTable2)with thehelpof abilingualdictionaryofEnglishandPunjabiwords.ThePunjabiwordsarewritteninphoneticusingRomanscriptletters.ThetransliterationofPunjabiwordsintophoneticwordsisdoneusingtheGRTsystem(S.K.Singh&Sachan,2019a).TheGSTimplementationFacebookcommentsaretakenfromanarticle(S.K.Singh&Sachan,2019b),moviereviews(Pang&Lee,2004),Amazonproductreviews(Book,DVD,Electronics, Kitchen) (Blitzer et al., 2007) and large Amazon product reviews (He & McAuley,2016).Thesizeofthedatasetfortestingis172,800comments/reviews(86,400foreachEnglishandEnglish_Punjabilanguage).

2.2. Conversion in Lower Case and TokenizationTheproposedsystemusesvariousdictionariesinwhichdataarestoredinlowercase,sothetestingdatasetmustbeconvertedintolowercase.Thecommentsaresplitintosentences,andsentencesintowordsbecausetheproposedsystemworksatthewordlevel.Theimportanceoftokenization(splitting)isdiscussedinthearticle(P.K.Singhetal.,2015).

2.3. Spell Checker of Bilingual TextThetestingdatasetisgeneratedbyinsertingthePunjabi(Phonetic)wordswithintheEnglishlanguagesentence,andonlycorrectPunjabiwordswithoutspellingmistakesareused.So,thereisnoneedtocheckthespellingofPunjabiwords.ThemisspelledwordisdetectedandcorrectedusingtheexistingsystemforEnglishwords(S.K.Singh&Sachan,2019b).

Figure 1. Proposed system framework for code-mixed bilingual phonetic text

Page 6: Classification of Code-Mixed Bilingual Phonetic Text Using Sentiment Analysis

International Journal on Semantic Web and Information SystemsVolume 17 • Issue 2 • April-June 2021

64

2.4. PoS TaggingPython’sNLTKtoolisusedtotagPart-of-speech(POS)alongwitheachword.ThePOStaggingmakesiteasytoidentifytheverb,adverb,adjective(Turney,2002)fromthesentence.

2.5. elimination of Stop Words from the Bilingual TextThestopwordsarethosewordsthatareuselessforsomenaturallanguageprocessingtasks.Theusesofstopwordsdependupontaskswherethesewordswillbeused.Thesignificanceofstopwordsremovalandvariousmethodstoeliminatesuchwordsareexplainedinthearticle(Saifetal.,2014).Afterremovingthesewords,theremainingwordsarefurtherprocessed.Thereare217stopwordsinthedictionary,whichinclude134Englishstopwordsfromthearticle(S.K.Singh&Sachan,2019b)and83selectedPunjabistopwordsfromthearticle(Kaur&Saini,2016).

2.6. StemmingStemmingistheprocessoffindingtherootwordfromitsinflectionalform.Duringthestemming,generallythesuffixesandprefixesareremovedfromitsinflectionalformtogettherootword(Jivani,2011).Forexample-keeps, kept, keepingaretheinflectional(grammatical)formofthe‘keep’word.Here,therootword‘keep’isfoundusingtheremovalofsuffixesfromkeepsandkeepingwords.Thesizeofthedatabasebecomesbulkywhenallformsofawordarestored.Hence,thedatabasestoresonlyrootwords.Therefore,averbstemmersystemisdevelopedforEnglishlanguagetext,asshowninFigures2and3,inalgorithms1and2.ThestemmingisdoneonlyforverbsofEnglishlanguagewordsbecauseVODcontainsonlyopinionverbs.StemVerbalgorithm1isusedtoextracttherootverbwordfromadifferentformofaverb.ThePOStaggersareattachedwitheachword,andidentifiedallverbsforstemming.Therearethreesub-stemmingfunctionsbasedonverbforms,suchasextractionfrom(a)‘ing’,(b)“ies,es,s”and(c)pastandpastparticiple(ied,ed,en,d,n)verbform.Algorithm 1: StemVerbInput: Word along with POS tagOutput: Word (root/original word)Step 1: If POS tag are VBP, VBG, VBZ, VBN and VBD, then word is a verb Step 2: (a) if word’s last letters are ‘ing’ then call stemming function to find root word from ‘ing’ verb form (shown in Figure 2) (b) if word’s last letters are ‘ies’, ‘es’, ‘s’, then call stemming function to find root word from ‘ies’, ‘es’, ‘s’ verb form (shown in Figure 3) (c) if word’s last letters are ‘ied’, ‘ed’, ‘en’, ‘d’, ‘n’ or VBN/VBD POS tag then call stemming function to find root word from past and past participle verb form (algorithm 2)

Table 2. Testing datasets

Sr. No. Datasets Size Language

Existing Generated

1 GSTImplementation 400comments English EnglishandPunjabi

2 MovieReviews 2000reviews English EnglishandPunjabi

3 AmazonProductReviews 8000reviews English EnglishandPunjabi

4 LargeAmazonProductReviews 76,000reviews English EnglishandPunjabi

Page 7: Classification of Code-Mixed Bilingual Phonetic Text Using Sentiment Analysis

International Journal on Semantic Web and Information SystemsVolume 17 • Issue 2 • April-June 2021

65

Step 3: return word (root word/original word)Step 4: Otherwise, word in not verbStep 5: Return word.

Theprocesstoextractrootverbfrom‘ing’verbformisshowninFigure2andexplainedwithsuitableexamples.

Example-1:W=Word‘keeping’istheinflectedformof‘keep’.TheWcontains‘ing’charactersattheend,soremovethe‘ing’charactersfromWandnowcheckW(keep)existinrootverblist(RVL),yesthenreturnedW(keep)tocallingalgorithm1.Example-2:W=Word‘lying’istheinflectedformof‘lie’.TheWcontains‘ing’charactersattheend,soremovethe‘ing’charactersfromWandnowcheckW(ly)existinRVL;noWdoesnotexistinRVL.Thencheck‘y’isthelastcharacterofW(ly),yessoreplace‘y’with‘ie’,andnowWis‘lie’,againcheckinRVL,WexistsinRVL.Finally,returnedW(lie)tocallingalgorithm1.

Example-3:W=Word‘getting’istheinflectedformof‘get’.TheWcontains‘ing’charactersattheend,soremovethe‘ing’charactersfromWandnowcheckW(gett)existinRVL,noWdoesnotexistinRVL.Thencheck‘y’isthelastcharacterofW(gett),nosochecklasttwocharactersaresameandnotvowels,yesnowremoveonelastcharacterfromW(gett).Finally,Wis‘get’word,whichisreturnedtocallingalgorithm1.

Example-4:W=Word‘choosing’istheinflectedformof‘choose’.TheWcontains‘ing’charactersattheend,soremovethe‘ing’charactersfromWandnowcheckW(choos)existinRVL;noWdoesnotexistinRVL.Thencheck‘y’isthelastcharacterofW(choos),nosochecklasttwocharactersaresameandnotvowels,nonowadd‘e’attheendofW(choos).Finally,Wis‘choose’,check-inRVL,WexistsinRVL.Therefore,returnedW(choose)tothecallingalgorithm1.

Figure3showstheprocesstoextracttherootverbwordfrom“ies,es,s”verbformandexplainedwithsuitableexamples.

Example-5:W=word‘applies’isaninflectedwordof‘apply’.TheWcontains‘ies’charactersattheend.Therefore,‘ies’charactersarereplacedwith‘y’,andW(apply)ischeckedinRVL,yesWexistinRVL,sofinallyW(apply)isreturnedtocallingalgorithm1.

Example 6:W=word‘bashes’isaninflectedwordof‘bash’.TheWcontains‘es’charactersattheend.Thenremove‘es’fromtheendofW(bashes)andcheckedW(bash)inRVL;yesWexistinRVL.Therefore,finallyreturnedWtocallingalgorithm1.

Example 7:W=word‘awakes’isaninflectedwordof‘awake’.TheWcontains‘es’charactersattheend.Thenremove‘es’fromtheendofW(awakes)andcheckedW(awak)inRVL;noWdoesnotexistinRVL.Nowremove‘s’fromW(awakes)andcheckW(awake)inRVL;yesWexistinRVL.Finally,W(awake)isreturnedtocallingalgorithm1.

Algorithm 2: Stemming verb from past and past participle verb formAbbreviations Used: W=word, RVL= root verb list, WT

= target wordInput: Word along with POS taggerOutput: Word (Root/Original word)Step 1: If word’s POS tag (VBN/VBD) and exist in an irregular verb form list, then extract the root verb word from the irregular verb list. Step 2: If ‘ied’ last letter of W, then W

T=W minus ‘ied’

and add ‘y’ at end of WT

(a) If WT exist in RVL, then return W

T

(b) Otherwise return WStep 3: If ‘ed’/ ‘en’ last letter of W, then W

T= W minus

Page 8: Classification of Code-Mixed Bilingual Phonetic Text Using Sentiment Analysis

International Journal on Semantic Web and Information SystemsVolume 17 • Issue 2 • April-June 2021

66

Figure 2. Extraction of root verb from ‘ing’ verb form

Figure 3. Extraction of root verb from (ies, es, s) verb form

Page 9: Classification of Code-Mixed Bilingual Phonetic Text Using Sentiment Analysis

International Journal on Semantic Web and Information SystemsVolume 17 • Issue 2 • April-June 2021

67

‘ed’/ ‘en’ and check WT in RVL

(a) If WT exist in RVL, then return W

T

(b) If ‘d’/ ‘n’ last letter of W, then WT= W minus ‘d’/ ‘n’

and check WT in RVL

(i) If WT exist in RVL, then return W

T

(c) If last two letter of WT is same, then

(i) If ‘en’ is last letter of W, then remove one last letter from W

T

(1) If WT exist in RVL, then return W

T

(2) Otherwise, add ‘e’ at end of WT and return W

T

(ii) Otherwise, remove one last letter from WT

(1) If WT exist in RVL, then return W

T

(2) Otherwise, return WStep 4: Otherwise, return W

2.7. Sentiment CalculationInthissection,thesentimentscoreofeachsentenceiscalculated.Somenegationhandlingsrulesareproposedtodeterminethepolarityofsentencesorwords,whenevernegationwords,negativeprefixes,andpositivewordsoccurwithinthesentence.

2.7.1 Negation HandlingThesignificanceofnegationwordswasfirsttimediscussedby(Polanyi&Zaenen,2006).Thesixnegationhandlingrulesareproposedandareusedwhenthepolarityofwordschangesduetothepresence of negation words1 or negative prefixes in the sentence (P. K. Singh et al., 2015). Forexample-Idonotlikethismovie.Here‘not’isanegationwordduetowhichthepolarityofthesentencechangesfrompositivetonegative.Thresholdnegative ( )Tn valueisfixedforallnegationwords,whichisequalto-0.25discussedinthearticle(S.K.Singh&Sachan,2019b).Thenegativeprefixes(dis,de,mis,ir, il, in,un)orsuffixesarealsoactingasnegationwords,sothenegationhandlingrules(showninTable5)areusedtocalculatethesentimentscorewheneverthesenegationwordsandnegativeprefixesorsuffixesoccurwithinasentence(S.K.Singh&Sachan,2019b).All14negationwords,7negativeprefixesofEnglishlanguagediscussedinthearticle(S.K.Singh&Sachan,2019b)and8negationwords(showninTable3),4negativeprefixes(de,ka,da,a)ofPunjabilanguageareusedfortheimplementationoftheproposedsystem.

Asentence’ssentimentisexpressedbytheopinionwords(positivewords,negationwords,andverbs)presentwithinthesentence.Thepositivewordsaretheopinionwords,buttheyarenotopinionverbs.Forexample-Nicephoto.Here‘nice’wordisnotaverb,butstillexpressesthesentimentofthesentence.Therefore,thesekindsofopinionwordsareconsideredaspositivewords(S.K.Singh&Sachan,2019b).Thereareatotalof35positivewords,outofwhich20wordsofEnglisharetakenfromthearticle(S.K.Singh&Sachan,2019b)and15wordsofPunjabilanguage(showninTable4).Thesentimentscorevalueforthesepositivewordsisdenotedasthresholdpositive( )Tp ,whichisequalto0.125valueasdiscussedby(S.K.Singh&Sachan,2019b).

2.7.2 Calculation of Sentiment ScoreThe VOD consists of opinion verbs of both languages (English and Punjabi) and the sentimentscorevalue.Thevalueofthesentimentscoreisfrom-1to1.ThenegationandpositivewordsarenotincludedinthisVODbecausetheymaybeadverbs,adjectives,determiners,andprepositions.Thereare1330opinionverbs,outofwhich677Englishwordsand553wordsofPunjabilanguage.Thesentimentscoretothesewordsisassignedmanuallybasedontheirpolarityandwiththehelpof

Page 10: Classification of Code-Mixed Bilingual Phonetic Text Using Sentiment Analysis

International Journal on Semantic Web and Information SystemsVolume 17 • Issue 2 • April-June 2021

68

existingEnglishSentiWordNet3.0(Baccianellaetal.,2010).ThisVODisdomain-dependent,butitcanbefurtherusedinotherdomainsonlyafteraddingsomewordsfromthatdomain.

Theopinionverbsareconsideredasafeatureandareextractedfromthesubjectivesentences;andsearchedintotheVOD,ifthewordispresentinVOD,thenextractthevalueofsentimentscoreoftheopinionverbwhichisdenotedasWordscore .ThevalueofWordscore ispositive(Wordscore > 0 )thenthewordispositive,otherwisethevalueofWordscore isnegative(Wordscore < 0 ),thentheword

Table 3. Negation words of Punjabi language

Sr. No. Negation Word Sr. No. Negation Word

1 naahi 5 bekaar

2 nahiin 6 galat

3 viruddh 7 gair

4 bura 8 nakaaraatamak

Table 4. Positive words of Punjabi language

Sr. No. Positive Word Sr. No. Positive Word Sr. No. Positive Word

1 haan 6 bharosa 11 piaar

2 thiik 7 changa 12 sahaaita

3 dhannavaad 8 shaanadaar 13 kadar

4 khair 9 vadhiia 14 vadhaaiiaan

5 mahaan 10 sahii 15 sakaaraatamak

Table 5. Rules for Negation Handling

S.No. Opinion verb Negative Prefixes

Negation Words

Positive Word

Sentiment Score Positive/ negative

Sentiment score calculation

1 Positive Yes Yes No Positive Pos Pos Wordscore score score= +

2 Positive Yes No No Negative Neg Neg Wordscore score score= − ×1

3 Positive No Yes No Negative Neg Neg Wordscore score score= − ×1

4 No No Yes Yes Negative Neg Neg Tscore score p= − ×1

5 Verboutofthedictionary

No Yes No Negative Neg Neg Tscore score n= +

6 Negative No Yes No Positive Pos Pos Wordscore score score= − ×1

Page 11: Classification of Code-Mixed Bilingual Phonetic Text Using Sentiment Analysis

International Journal on Semantic Web and Information SystemsVolume 17 • Issue 2 • April-June 2021

69

isnegative.Thepositivesentimentscore Posscore( ) valueforallpositiveopinionwords(verb/term)iscalculatedusingEquation(1)andnegativesentimentscore( Negscore) valueforallnegativeopinionwords(verb/term)usingEquation(2)withinasentence.Thesentencesentimentscore �( )Sentscore iscalculatedusingEquation(3)inwhichthenormalizedvalueof Posscore and Negscore isadded.Tonormalizethe Posscore and Negscore , Posscore isdividedbythetotalpositiveopinionwords( p )andNegscore isdividedbythetotalnegativeopinionwords( n )respectivelyinEquation(3).Thedocument/commentsentimentscoreiscalculatedusingEquation(4),bytakingthesummationofthesentimentscorevalue �( )Sentscore .ofallthesentences,andthetotalnumberofsentencesisdenotedby‘s’inEquation(4).Thevalueofeachsentence’ssentimentscoreiscalculatedusingthe‘calculationofsentimentscore’algorithm3.

Pos Word if Wordscorei

p

score i score= >=∑0

0( ) (1)

Neg Word if Wordscorej

n

score j score= <=∑0

0( ) (2)

SentPosp

Negnscore

score score= + (3)

Doc Sentscorek

s

score k==∑0

( ) (4)

Algorithm: 3 Calculation of sentiment scoreNotations: W-> Word, VOD -> Verb Opinion Dictionary, NW-> Negation Words, NHR->Negation Handling Rules, NP-> Negative Prefixes, PWL-> Positive Word List Input: Sentence (text)Output: Sentence’s sentiment scoreStep1: Read word (W) by word from the sentence and repeat steps 2 to 5 Step2: If last W of a sentence is a NW, then add Tn to Negscore and add 1 to n, goto step 1 Step3: If W is found in VOD then extract Wordscore from VOD (a) if NW comes before W, then update PosScore or Negscore using NHR and add 1 to p/n, goto step 1 (b) otherwise, update PosScore or Negscore using Equation (1) or (2) and add 1 to p/n, goto step 1 Step4: If W is found in PWL, then

Page 12: Classification of Code-Mixed Bilingual Phonetic Text Using Sentiment Analysis

International Journal on Semantic Web and Information SystemsVolume 17 • Issue 2 • April-June 2021

70

(a) if NW comes before W, then update Negscore using NHR and add 1 to n, goto step 1 (b) otherwise, update PosScore by adding Tp and add 1 to p, goto step 1 Step5: If W does not find in VOD and PWL, then (a) if NW comes before W, then update Negscore NHR and add 1 to n, goto step 1 (b) if NP exist in W and root word found in VOD, then extract Wordscore from VOD (i) if NW comes before W, then update PosScore using NHR and add 1 to p, goto step 1 (ii) otherwise, update NegScore using NHR and add 1 to n, goto step 1 Step6: Sentscore is updated using Equation (3)Step7: Sentscore is return

2.8. document ClassificationThedocumentisclassifiedintotwoclasses(negativeorpositive)usingbinaryclassclassification.Thedocument’ssentimentscore(Docscore )iscalculatedusingEquation(4),andthedocumentisclassifiedusingarule-basedclassifierEquation(5).Thedocumentisclassifiedasapositiveclassifits’ Docscore valueisgreaterthanorequaltozero;otherwise,anegativeclassusingEquation(5).ThosedocumentswhoseDocscore valueisequaltozeroisconsideredpositiveduetothepresenceofpositivewordsinmostcasesinplaceofanopinionverb(S.K.Singh&Sachan,2019b).

DocumentPositive if DocNegative otherwise

score=≥

0 (5)

3. eXPeRIMeNTAL ReSULTS ANd dISCUSSIoN

TheproposedSAsystem’sperformanceisevaluatedforEnglishandEnglish_Punjabi(bilingual)textonfourdifferentdatasets(discussedinsection2.1).Theproposedsystemusedadictionary-basedapproachandvariousdictionariessuchasstopwords,negativeprefixes,negationwords,positivewordsofbothEnglishandPunjabilanguage.TheTn = −0 25. isthethresholdnegativevalueandTp = 0 125. is the thresholdpositivevaluewereused(S.K.Singh&Sachan,2019b).Also, theperformanceoftheproposedStemVerbsystemisdiscussedinthissection.

The system performance is measured using performance metrics such as recall, precision,accuracy,andFscore(Equation6to9).Its’somerelatedtermsaredefinedastruepositive(Tp ):positivetextsarepredictedaspositive;falsepositive( Fp ):negativetextsarepredictedaspositive;truenegative(Tn ):negativetextsarepredictedasnegativeandfalsenegative( Fn ):positivetextsarepredictedasnegative.

Page 13: Classification of Code-Mixed Bilingual Phonetic Text Using Sentiment Analysis

International Journal on Semantic Web and Information SystemsVolume 17 • Issue 2 • April-June 2021

71

RecallT

T Fp

p n

=+( )

(6)

PrecisionT

T Fp

p p

=+( )

(7)

AccuracyT T

T F F Tp n

p p n n

=+( )

+ + +( )

(8)

F score recall precisionrecall precision

� =× ×

+2 (9)

3.1 Performance of System evaluation on different datasetsTheperformanceof theproposedsystemisdiscussedin thissectiononfourdifferentdataset(s).Alldataset(s)sizeand its’domainaredifferent.Thesystem’sperformance isbetteron theGSTimplementationdataset thanotherdatasets (shown inFigures4 and5).Theperformanceof theproposedsystemisbetterintheEnglishthantheEnglish_Punjabilanguagedatasets.

Theproposedsystemisdesignedforcode-mixedbilingualphonetictext,butitsperformanceisevaluatedonbothmonolingual(English)andbilingual(English_Punjabi)text.AsperTable6,thesystemperformanceisbetterthantheexistingstate-of-the-artmethodsonalldatasets.Thesystemachievedbetterperformanceintermsofaccuracyintherangeof3%to33.38%onsocialmediadataset,1.5%to3.8%onmoviereviews,1.35%to13.81%onAmazonproductreviews,4.56%to7.34%onlargeAmazonproductreviewsformonolingualtext.Inthebilingualtext,systemperformanceisbetterthanotherexistingmethodsintermsofFscore(13.22%onsocialmediaand17.03%onmoviereviews).

3.2 Performance evaluation of StemVerb SystemAStemVerbsystemisproposedtoextracttherootverbfromtheinflectionalformofverbs.TheframeworkandworkingprocessoftheStemVerbsystem(discussedinsection2.6).Therootverblistof3582wordsandirregularverbformsalistof222wordsisgeneratedwithdifferentresources2,3,.Thereare3992wordsusedtotesttheStemVerbsystemandtheexistingtwoalgorithmsperformance,suchasSnowball4andLancaster5.TheStemVerbsystemistheonlyproposedfortheverboftheEnglishlanguage.TheperformanceofStemVerbisbetter thanSnowballandLancasterstemmer(showninFigure6).

TheperformanceofSAsystemevaluatedintermsofaccuracyforbilingualtextusingallthreestemmers,butSAsystemperformedbetterusingtheStemVerbsystemthantheothertwoexistingstemmers(showninFigure7).Also,theaccuracyoftheproposedSAsystemhasbeenimprovedafterthestemmingofverbsonbilingualdatasets(showninFigure8).TheVODcontainsonlyrootverbsalongwiththeirsentimentscores.Therefore,thestemmingofverbwordsimprovestheperformanceoftheproposedSAsystem.

Page 14: Classification of Code-Mixed Bilingual Phonetic Text Using Sentiment Analysis

International Journal on Semantic Web and Information SystemsVolume 17 • Issue 2 • April-June 2021

72

4. CoNCLUSIoN ANd FUTURe SCoPe

Thesentimentanalysissystemforbilingualcode-mixedphonetictext isdevelopedandtestedonfourdatasets,writteninbothmonolingualandbilingual.TheexistingmonolingualtextinEnglishis converted into bilingual (English and Punjabi) code-mixed phonetic text. The spell checker,VOD,negationwords,positivewords,negativeprefixes,negationhandlingrules,stopwords,andStemVerbsystemisdevelopedandused toextract thewriter’ssentimentfromtheirwritten text.ThedevelopedVODforEnglishandPunjabilanguage,contains1330verbopinionwordsofboth

Figure 4. Performance of system on the different dataset(s) (English)

Figure 5. Performance of system on the different datasets (English and Bilingual)

Page 15: Classification of Code-Mixed Bilingual Phonetic Text Using Sentiment Analysis

International Journal on Semantic Web and Information SystemsVolume 17 • Issue 2 • April-June 2021

73

Table 6. Experimental results are compared with state-of-the-art

Authors Method POS Dataset Language Accuracy (%)

F Score (%)

(Taboadaetal.,2011) Lexicon-based

Adjective,adverb,verb,noun

Moviereviews English 70.10

(Karamibekr&Ghorbani,2012)

Dictionary-based

Verb,adverb,adjective Socialmedia English 65.00

(Iqbaletal.,2015) BATwithLexicon-based Moviereviews English 69.00

(P.K.Singhetal.,2015)

Dictionary-based Verb Socialmedia English 79.00

(Shamsudinetal.,2016)

Dictionary-based

Verb,adverb,adjective Socialmedia Malay 52.12

(Bhargavaetal.,2016) SentiWordNet Moviereviews English_Hindi 54.40

(Cruzetal.,2017) Dictionary-based

Adjective,noun Moviereviews English 66.00

(Ruder&Plank,2018)

Multi-tasktri-training

Amazonproductreviews English 79.15

(Hanetal.,2018) Dictionary-based

Amazonproductreviews English 69.52 69.60

(BaoxinWang,2018)

DisconnectedRecurrentNeuralNetworks

LargeAmazonproductreviews English 64.43

(Yu&Liu,2018)

SlicedRecurrentNeuralNetworks

LargeAmazonproductreviews English 61.65

(S.K.Singh&Sachan,2019b)

Dictionary-based Verb Socialmedia English 82.50 87.04

(S.K.Singh&Sachan,2019b)

Dictionary-based Verb Moviereviews English 71.30 71.39

(Patwaetal.,2020) XLM-R Socialmedia English_Hindi 75.00

(Kansaletal.,2020) Logisticregression

Amazonproductreviews English 81.98

Proposed system Dictionary-based Verb

SocialmediaEnglish 85.50 89.18

English_Punjabi 84.25 88.22

MoviereviewsEnglish 72.80 73.12

English_Punjabi 71.00 71.43

Amazonproductreviews

English 83.33 83.27

English_Punjabi 80.87 81.20

LargeAmazonproductreviews

English 68.99 68.07

English_Punjabi 65.34 65.84

Page 16: Classification of Code-Mixed Bilingual Phonetic Text Using Sentiment Analysis

International Journal on Semantic Web and Information SystemsVolume 17 • Issue 2 • April-June 2021

74

EnglishandPunjabilanguage.Thelistofirregularverbsof222wordsandrootverblistof3582words isdeveloped to improve theSAsystem’sperformance.TheSAsystemclassifies text intonegativeorpositiveclassusingarule-basedclassifierandsentimentscorevalueextractedfromtheVOD.TheSAsystemobtainedanaccuracyof85.5%and84.25%ontheGSTdataset;72.8%and71%onthemoviereviews;83.33%and80.87%onAmazonproductreviews;68.99%and65.34%onlargeAmazonproductreviewsformonolingualandbilingualtextrespectively.Thissystemwillautomaticallyclassifycustomers’feedbackintopositiveandnegativeclass,andwillhelpcustomerstodecidebeforepurchasingaproduct;manufacturerscanenhancethequalityofproductandservices

Figure 6. Performance of stemmers (Snowball, Lancaster, and StemVerb)

Figure 7. Sentiment analysis system’s accuracy using all three stemmers

Page 17: Classification of Code-Mixed Bilingual Phonetic Text Using Sentiment Analysis

International Journal on Semantic Web and Information SystemsVolume 17 • Issue 2 • April-June 2021

75

basedonit,andcankeepaneyeontheircompetitors;knowthemoodofthepublicbeforeelectionandgovernmentpolicies.TheproposedSAsystemforbilingualtextcanbefurtherextendedtoothercode-mixedbilingualtextsbymodifyingsomecomponentseasily.ThedevelopedVODcanalsobeextendedforotherpart-of-speechandtootherlanguages.

Figure 8. Sentiment analysis system’s accuracy with and without stemming

Page 18: Classification of Code-Mixed Bilingual Phonetic Text Using Sentiment Analysis

International Journal on Semantic Web and Information SystemsVolume 17 • Issue 2 • April-June 2021

76

ReFeReNCeS

Alharbi,J.R.,&Alhalabi,W.S.(2020).HybridApproachforSentimentAnalysisofTwitterPostsUsingaDictionary-basedApproachandFuzzyLogicMethods.International Journal on Semantic Web and Information Systems,16(1),116–145.doi:10.4018/IJSWIS.2020010106

Baccianella,S.,Esuli,A.,&Sebastiani,F.(2010).SENTIWORDNET3.0:Anenhancedlexicalresourceforsentimentanalysisandopinionmining.7th International Conference on Language Resources and Evaluation,2200–2204.http://nmis.isti.cnr.it/sebastiani/Publications/LREC10.pdf

Wang,B.(2018).DisconnectedRecurrentNeuralNetworksforTextCategorization.56th Annual Meeting of the Association for Computational Linguistics,2311–2320.doi:10.18653/v1/P18-1215

Bhargava,R.,Sharma,Y.,&Sharma,S. (2016).Sentimentanalysis formixedscript Indicsentences.2016 International Conference on Advances in Computing, Communications and Informatics,524–529.doi:10.1109/ICACCI.2016.7732099

Blitzer, J., Dredze, M., & Pereira, F. (2007). Biographies, bollywood, boom-boxes and blenders: Domainadaptationforsentimentclassification.ACL 2007 - Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics,440–447.https://www.aclweb.org/anthology/P07-1056.pdf

Bonadiman,D.,Castellucci,G.,Favalli,A.,Romagnoli,R.,&Moschitti,A.(2017).NeuralSentimentAnalysisforaReal-WorldApplication.InProceedings of the Fourth Italian Conference on Computational Linguistics CLiC-it 2017(pp.42–47).AccademiaUniversityPress.doi:10.4000/books.aaccademia.2357

Cambria,E.,Poria,S.,Bajpai,R.,&Schuller,B.(2016).SenticNet4:Asemanticresourceforsentimentanalysisbasedonconceptualprimitives.COLING 2016 - 26th International Conference on Computational Linguistics, Proceedings of COLING 2016: Technical Papers,2666–2677.https://www.aclweb.org/anthology/C16-1251.pdf

Cruz,L.,Ochoa,J.,Roche,M.,&Poncelet,P.(2017).Dictionary-basedsentimentanalysisappliedtoaspecificdomain.InA.-S.H.Lossio-VenturaJ.(Ed.),CommunicationsinComputerandInformationScience:Vol.656CCIS(pp.57–68).Springer.doi:10.1007/978-3-319-55209-5_5

D’Andrea,A.,Ferri,F.,Grifoni,P.,&Guzzo,T.(2015).Approaches,ToolsandApplicationsforSentimentAnalysisImplementation.International Journal of Computers and Applications,125(3),26–33.doi:10.5120/ijca2015905866

Drus,Z.,&Khalid,H.(2019).SentimentAnalysisinSocialMediaandItsApplication:SystematicLiteratureReview.Procedia Computer Science,161,707–714.doi:10.1016/j.procs.2019.11.174

Dutta,S.,Saha,T.,Banerjee,S.,&Naskar,S.K.(2015).Textnormalizationincode-mixedsocialmediatext.2015 IEEE 2nd International Conference on Recent Trends in Information Systems, 378–382.doi:10.1109/ReTIS.2015.7232908

Guellil, I., & Boukhalfa, K. (2015). Social big data mining: A survey focused on opinion mining andsentimentsanalysis.2015 12th International Symposium on Programming and Systems,1–10.doi:10.1109/ISPS.2015.7244976

Han,H.,Zhang,Y.,Zhang,J.,Yang,J.,&Zou,X.(2018).Improvingtheperformanceoflexicon-basedreviewsentiment analysis method by reducing additional introduced sentiment bias. PLoS One, 13(8), e0202523.doi:10.1371/journal.pone.0202523PMID:30142154

He,R.,&McAuley,J.(2016).UpsandDowns:ModelingtheVisualEvolutionofFashionTrendswithOne-Class Collaborative Filtering. International World Wide Web Conference Committee (IW3C2), 507–517.doi:10.1145/2872427.2883037

Hopken,W.,Fuchs,M.,Menner,T.,&Lexhagen,M.(2017).SensingtheOnlineSocialSphereUsingaSentimentAnalyticalApproach.InZ.Xiang&D.R.Fesenmaier(Eds.),Analytics in Smart Tourism Design Concepts and Methods(pp.129–146).Springer.doi:10.1007/978-3-319-44263-1_8

Hussein,D.M.E.-D.M.(2018).Asurveyonsentimentanalysischallenges.Journal of King Saud University -.Engineering and Science,30(4),330–338.doi:10.1016/j.jksues.2016.04.002

Page 19: Classification of Code-Mixed Bilingual Phonetic Text Using Sentiment Analysis

International Journal on Semantic Web and Information SystemsVolume 17 • Issue 2 • April-June 2021

77

Iqbal,M.,Karim,A.,&Kamiran,F.(2015).Bias-awarelexicon-basedsentimentanalysis.Proceedings of the 30th Annual ACM Symposium on Applied Computing - SAC ’15,845–850.doi:10.1145/2695664.2695759

Jivani, A. G. (2011). A Comparative Study of Stemming Algorithms. International Journal of Computer Technology and Applications,2(6),1930–1938.https://pdfs.semanticscholar.org/1c0c/0fa35d4ff8a2f925eb955e48d655494bd167.pdf

Kansal,N.,Goel,L.,&Gupta,S.(2020).Cross-domain sentiment classification initiated with Polarity Detection Task.EAIEndorsedTransactionsonScalableInformationSystems.,doi:10.4108/eai.26-5-2020.165965

Karamibekr, M., & Ghorbani, A. A. (2012). Verb oriented sentiment classification. 2012 IEEE/WIC/ACM International Conference on Web Intelligence,327–331.doi:10.1109/WI-IAT.2012.122

Kaur,J.,&Saini,J.R.(2016).PunjabiStopWords:AGurmukhi,ShahmukhiandRomanScriptedChronicle.ACM Symposium on Women in Research 2016,32–37.doi:10.1145/2909067.2909073

Lo,S.L.,Cambria,E.,Chiong,R.,&Cornforth,D.(2017).Multilingualsentimentanalysis:Fromformaltoinformalandscarceresourcelanguages.Artificial Intelligence Review,48(4),499–527.doi:10.1007/s10462-016-9508-4

Medhat,W.,Hassan,A.,&Korashy,H.(2014).Sentimentanalysisalgorithmsandapplications:Asurvey.Ain Shams Engineering Journal,5(4),1093–1113.doi:10.1016/j.asej.2014.04.011

Pang,B.,&Lee,L.(2004).ASentimentalEducation:SentimentAnalysisUsingSubjectivitySummarizationBasedonMinimumCuts.Proceedings of the ACL.https://arxiv.org/abs/cs/0409058

Pang,B.,&Lee,L.(2008).OpinionMiningandSentimentAnalysis.Foundations and Trends® in Information Retrieval, 2(1–2),1–135.10.1561/1500000011

Patwa,P.,Aguilar,G.,Kar,S.,&Pandey,S.(2020).SemEval-2020 Task 9: Overview of Sentiment Analysis of Code-Mixed Tweets.https://arxiv.org/abs/2008.04277

Peng,L.,Cui,G.,Zhuang,M.,&Li,C.(2014).Whatdosellermanipulationsofonlineproductreviewsmeantoconsumers?InDigital Commons @ Lingnan University(HKIBS/WPS/070-1314).https://commons.ln.edu.hk/hkibswp/70

Polanyi,L.,&Zaenen,A.(2006).ContextualValenceShifters.InComputingAttitudeandAffectinText:TheoryandApplications(pp.1–10).Springer-Verlag.doi:10.1007/1-4020-4102-0_1

PressTrustOfIndia.(2013,July10).Indiatohavethehighestinternettrafficgrowthrate.Business Standard.https://www.business-standard.com/article/technology/india-to-have-the-highest-internet-traffic-growth-rate-113071000014_1.html

Ravi, K., & Ravi, V. (2015). A survey on opinion mining and sentiment analysis: Tasks, approaches andapplications.Knowledge-Based Systems,89,14–46.doi:10.1016/j.knosys.2015.06.015

Ruder,S.,&Plank,B.(2018).Strongbaselinesforneuralsemi-supervisedlearningunderdomainshift.ACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers), 1,1044–1054.doi:10.18653/v1/P18-1096

Saif,H.,Fernandez,M.,He,Y.,&Alani,H.(2014).Onstopwords,filteringanddatasparsityforsentimentanalysisoftwitter.9th International Conference on Language Resources and Evaluation,810–817.http://oro.open.ac.uk/id/eprint/40666

Serrano-Guerrero,J.,Olivas,J.A.,Romero,F.P.,&Herrera-Viedma,E.(2015).Sentimentanalysis:Areviewandcomparativeanalysisofwebservices.Information Sciences,311,18–38.doi:10.1016/j.ins.2015.03.040

Shamsudin,N.F.,Basiron,H.,&Sa’aya,Z. (2016).LexicalBasedSentimentAnalysis –Verb,Adverb&Negation.Journal of Telecommunication, Electronic and Computer Engineering, 8(2),161–166.https://journal.utem.edu.my/index.php/jtec/article/view/976/566

Singh,P.K.,Singh,S.K.,&Paul,S.(2015).Sentimentclassificationofsocialissuesusingcontextualvalenceshifters.International Journal of Engineering and Technology,7(4),1443–1452.http://www.enggjournals.com/ijet/docs/IJET15-07-04-335.pdf

Page 20: Classification of Code-Mixed Bilingual Phonetic Text Using Sentiment Analysis

International Journal on Semantic Web and Information SystemsVolume 17 • Issue 2 • April-June 2021

78

Shailendra Kumar Singh is currently a Research Scholar in the Department of Computer Science and Engineering at Sant Longowal Institute of Engineering and Technology (SLIET), Sangrur, Punjab, India. He obtained his B.Tech degree in Information Technology from UPTU (Lucknow) in 2012 and Master of Engineering in Software Engineering from Birla Institute of Technology (BIT), Mesra, Ranchi (India). He has qualified national level exams (GATE-13 & UGC-NET-2018, 2019). His research interests include handwriting recognition, sentiment analysis, natural language processing, data mining, human stress level detection and personality detection.

Manoj Kumar Sachan (PhD) is currently a Professor at Sant Longowal Institute of Engineering and Technology (SLIET), India. He is associated with the Department of Computer Science and Engineering. He did his B.Tech in Computer Science from Punjabi University, Patiala, India. He did M.E in Computer Science from Thapar Institute of Engineering & Technology, Patiala, and Ph.D from Punjab Technical University, Jalandhar, India. His research interests include handwriting recognition, stress detection, opinion mining, medical image processing, natural language processing, and data mining.

Singh,R.K.,Sachan,M.K.,&Patel,R.B.(2020).360degreeviewofcross-domainopinionclassification:Asurvey.Artificial Intelligence Review.Advanceonlinepublication.doi:10.1007/s10462-020-09884-9

Singh,S.K.,&Sachan,M.K.(2019a).GRT:Gurmukhi toRomanTransliterationSystemusingCharacterMappingandHandcraftedRules.International Journal of Innovative Technology and Exploring Engineering,8(9),2758–2763.doi:10.35940/ijitee.I8636.078919

Singh,S.K.,&Sachan,M.K.(2019b).SentiVerbsystem:Classificationofsocialmediatextusingsentimentanalysis.Multimedia Tools and Applications,78(22),32109–32136.doi:10.1007/s11042-019-07995-2

Taboada,M.,Brooke,J.,Tofiloski,M.,Voll,K.,&Stede,M.(2011).Lexicon-BasedMethodsforSentimentAnalysis.Computational Linguistics,37(2),267–307.doi:10.1162/COLI_a_00049

Turney, P. D. (2002). thumbs up or thumbs down? semantic orientation applied to unsupervisedclassification of reviews. 40th Annual Meeting on Association for Computational Linguistics, 417–424.doi:10.3115/1073083.1073153

Yu,Z.,&Liu,G.(2018).SlicedRecurrentNeuralNetworks.27th International Conference on Computational Linguistics,2953–2964.https://www.aclweb.org/anthology/C18-1250

Yue,L.,Chen,W.,Li,X.,Zuo,W.,&Yin,M.(2019).Asurveyofsentimentanalysisinsocialmedia.Knowledge and Information Systems,60(2),617–663.doi:10.1007/s10115-018-1236-4

eNdNoTeS

1 Negationwordsreversethepolarityofthewordorsentence,ifthesewordsappearbeforenegativeorpositivewordinthesentence.

2 https://www.worldclasslearning.com/english/five-verb-forms.html3 https://www.enchantedlearning.com/wordlist/verbs.shtml4 https://snowballstem.org/demo.html5 https://text-processing.com/demo/stem/