Gwc 2016

8
A Language-independent Model for Introducing a New Semantic Relation Between Adjectives and Nouns in a WordNet Miljana Mladenovi´ c Faculty of Mathematics University of Belgrade [email protected] Jelena Mitrovi´ c Faculty of Philology University of Belgrade [email protected] Cvetana Krstev Faculty of Philology University of Belgrade [email protected] Abstract The aim of this paper is to show a language-independent process of creating a new semantic relation between adjec- tives and nouns in wordnets. The ex- istence of such a relation is expected to improve the detection of figurative lan- guage and sentiment analysis (SA). The proposed method uses an annotated corpus to explore the semantic knowledge con- tained in linguistic constructs performing as the rhetorical figure Simile. Based on the frequency of occurrence of similes in an annotated corpus, we propose a new relation, which connects the noun synset with the synset of an adjective represent- ing that noun’s specific attribute. We elab- orate on adding this new relation in the case of the Serbian WordNet (SWN). The proposed method is evaluated by human judgement in order to determine the rel- evance of automatically selected relation items. The evaluation has shown that 84% of the automatically selected and the most frequent linguistic constructs, whose fre- quency threshold was equal to 3, were also selected by humans. 1 Introduction In this paper, we want to demonstrate that a Word- Net (WN) can be expanded by a new semantic relation between adjectives and nouns in a way that could allow for its usage in detecting figura- tive language and in existing methods of sentiment analysis. WN is used successfully for analysis of literal meaning of texts using SA methods (Pease et al., 2012), (Reyes and Rosso, 2012), (Rade- maker et al., 2014). Resources that came out of the Princeton WordNet (PWN), such as SentiWord- Net (Esuli and Sebastiani, 2006), (Baccianella et al., 2010), WordNetAffect (Strapparava and Val- itutti, 2004) and others, which define the prior sentiment polarity (taken out of the context) of synsets are also being used. Still, the intensity of sentiment polarity of the lexical representation of synsets can be reduced, increased or completely changed in a given context with the usage of rhetorical figures from the group of Tropes — fig- ures that change the meaning of words or phrases over which the figure itself is formed. These fig- ures can be metaphor, metonymy, irony, sarcasm, oxymoron, simile, dysphemism, euphemism, hy- perbole, litotes etc. (Mladenovi´ c and Mitrovi´ c, 2013). Analysing the usage of figurative language in the form of ironic similes, Hao and Veale (2010) noticed that they act similarly to valence shifters (Kennedy and Inkpen, 2006) “not”, “never” and “avoid” in text, because they change the polarity of sentiment words or phrases. In general, modifiers decrease, increase or change the sentiment polar- ity of words or phrases. Tropes work in a similar way. By definition, irony and sarcasm change the polarity, dysphemism and hyperbole increase the existing level of sentiment expressiveness, while litotes and euphemism decrease that expressive- ness. Metaphor, metonymy, oxymoron and sim- ile have a more complex mechanism of affecting both directions of change regarding the strength and polarity of sentiment. Automatic detection of figurative language is a new area of interest in the field of SA that can improve the existing SA methods. Reyes and Rosso (2012) showed that the precision of clas- sification in an SA task can be improved signifi- cantly (from 54% to 89.05% max.) when predic- tors detecting figurative speech are involved, com- pared to a set of predictors that treat the text liter- ally. Similarly, Rentoumi et al. (2010) improved the SA method of machine learning by integrating it with a rule-based method which detects the us- age of figurative language, so the integrated meth-

Transcript of Gwc 2016

Page 1: Gwc 2016

A Language-independent Model for Introducing a New Semantic RelationBetween Adjectives and Nouns in a WordNet

Miljana MladenovicFaculty of MathematicsUniversity of Belgrade

[email protected]

Jelena MitrovicFaculty of Philology

University of [email protected]

Cvetana KrstevFaculty of Philology

University of [email protected]

Abstract

The aim of this paper is to show alanguage-independent process of creatinga new semantic relation between adjec-tives and nouns in wordnets. The ex-istence of such a relation is expected toimprove the detection of figurative lan-guage and sentiment analysis (SA). Theproposed method uses an annotated corpusto explore the semantic knowledge con-tained in linguistic constructs performingas the rhetorical figure Simile. Based onthe frequency of occurrence of similes inan annotated corpus, we propose a newrelation, which connects the noun synsetwith the synset of an adjective represent-ing that noun’s specific attribute. We elab-orate on adding this new relation in thecase of the Serbian WordNet (SWN). Theproposed method is evaluated by humanjudgement in order to determine the rel-evance of automatically selected relationitems. The evaluation has shown that 84%of the automatically selected and the mostfrequent linguistic constructs, whose fre-quency threshold was equal to 3, were alsoselected by humans.

1 Introduction

In this paper, we want to demonstrate that a Word-Net (WN) can be expanded by a new semanticrelation between adjectives and nouns in a waythat could allow for its usage in detecting figura-tive language and in existing methods of sentimentanalysis. WN is used successfully for analysis ofliteral meaning of texts using SA methods (Peaseet al., 2012), (Reyes and Rosso, 2012), (Rade-maker et al., 2014). Resources that came out of thePrinceton WordNet (PWN), such as SentiWord-Net (Esuli and Sebastiani, 2006), (Baccianella et

al., 2010), WordNetAffect (Strapparava and Val-itutti, 2004) and others, which define the priorsentiment polarity (taken out of the context) ofsynsets are also being used. Still, the intensity ofsentiment polarity of the lexical representation ofsynsets can be reduced, increased or completelychanged in a given context with the usage ofrhetorical figures from the group of Tropes — fig-ures that change the meaning of words or phrasesover which the figure itself is formed. These fig-ures can be metaphor, metonymy, irony, sarcasm,oxymoron, simile, dysphemism, euphemism, hy-perbole, litotes etc. (Mladenovic and Mitrovic,2013). Analysing the usage of figurative languagein the form of ironic similes, Hao and Veale (2010)noticed that they act similarly to valence shifters(Kennedy and Inkpen, 2006) “not”, “never” and“avoid” in text, because they change the polarity ofsentiment words or phrases. In general, modifiersdecrease, increase or change the sentiment polar-ity of words or phrases. Tropes work in a similarway. By definition, irony and sarcasm change thepolarity, dysphemism and hyperbole increase theexisting level of sentiment expressiveness, whilelitotes and euphemism decrease that expressive-ness. Metaphor, metonymy, oxymoron and sim-ile have a more complex mechanism of affectingboth directions of change regarding the strengthand polarity of sentiment.

Automatic detection of figurative language is anew area of interest in the field of SA that canimprove the existing SA methods. Reyes andRosso (2012) showed that the precision of clas-sification in an SA task can be improved signifi-cantly (from 54% to 89.05% max.) when predic-tors detecting figurative speech are involved, com-pared to a set of predictors that treat the text liter-ally. Similarly, Rentoumi et al. (2010) improvedthe SA method of machine learning by integratingit with a rule-based method which detects the us-age of figurative language, so the integrated meth-

Page 2: Gwc 2016

ods achieved better precision than the baseline.

2 Related work

WordNet is a dynamic, flexible structure that canbe expanded in different ways and for variouspurposes. In certain cases, introducing morpho-semantic relations results in solving the prob-lems that stem from specificities of a languagewith rich morphology and derivation (Koeva et al.,2008). Otherwise, introducing new semantic rela-tions can lead to the improvement of the represen-tation of relations between synsets, e.g. Kuti etal. (2008) present a semantic relation scalar mid-dle with which the antonimy relation of two de-scriptive adjective synsets is transformed into atriple gradable structure lower-upper-middle. An-gioni et al. (2008) define a new relation Common-sense with which a literal in a synset is beingconnected with Wikipedia links in which it is de-scribed, while Maziarz et al. (2012) introduce a se-ries of relations pertinent to adjectives, e.g. deriva-tional relations comparative and superlative definegradable forms of descriptive adjectives. Deriva-tional relation similarity defines a relation betweenan adjective and a noun such that, based on a givenadjective, the structure or form of the object de-scribed by the noun can be discovered. Similarly,derivational relation characterstic defines a rela-tion between an adjective and a noun where thecontents or quality of an object described by thenoun is known based on the adjective, e.g. basedon the statement “If someone is famous, then he ischaracterised by fame” the relation characteristicwill be set between the noun fame and the adjec-tive famous.

The new semantic relation between nouns andadjectives in the Portuguese WordNet is describedin (Marrafa et al., 2006) and (Mendes, 2006).This relation is given in the form of a pair of in-verse relations a characteristic of / has as a char-acteristic. According to the authors, althoughthe purpose of the relation is to mark signifi-cant characteristics of a noun expressed by an ad-jective (e.g.’{carnivorous} is a characteristic of{shark}’), the status of this relation in the senseof lexical knowledge is not completely clear. Au-thors also point out that introducing this new re-lation enriches a WordNet, that it can contributeto the process of determining the semantic do-main of an adjective and that it can be included inreasoning applications. Veale and Hao also sug-

gest specific enrichment of WordNet in their pa-pers (Veale and Hao, 2008) and (Hao and Veale,2010). As a source to be used in that enrich-ment, authors suggest semantic knowledge con-tained in language constructs of the form as ADJas a NOUN which, in fact, are similes (e.g. “asfree as a bird”, “as busy as a bee”). In orderto obtain examples of simile, the authors firstextracted all antonymous pairs of adjectives inPWN and made a list of candidate adjectives. Foreach adjective ADJ from that list, a query in theform as ADJ as a * was made and sent tothe Google search engine. Out of the obtainedresults, the first 200 snippets were kept. A col-lection of as ADJ as a NOUN constructs wasmade and a task of disambiguation was performedover it. In this process, one noun (peacock)can semantically be connected to many adjectivesbased on different semantic grounds. The struc-ture, named by the authors as frame:slot:filler,consists of a noun (frame), property of the noun(slot) and an adjective as a value of the prop-erty (filler). For one noun there can be a numberof instances of such structure. Authors point outthat an average number of slot:filler constructs perone noun obtained in this particular research was8. For instance, the noun peacock contains thefollowing set of slot:filler values: {Has feather:brilliant; Has plumage: extravagant; Has strut:proud; Has tail: elegant; Has display: colorful;Has manner: stately; Has appearance: beauti-ful}, therefore the suggested enrichment of Word-Net only for the noun peacock leads to addition of7 relations out of which the first one is of the form’{peacock} Has feather {brilliant}’.

3 Motivation

The research described in this paper is based onthe previously mentioned research results by Mar-rafa et al. (2006) and Mendes (2006), because weare searching for specific relations between nounsand adjectives. However, unlike the relation hasas a characteristic which connects a number ofnouns {shark, cobra, orca, predator,...} to the sameadjective {carnivorous}, we consider those de-scriptive adjectives that are specific to a small setof nouns, or only to a single noun. In the processof generating of the new relation, we are propos-ing usage of the rhetorical figure simile which hasa relatively high frequency of occurrence in textswritten in a natural language. In that case, the re-

Cvetana
Highlight
razmak posle e.g. a onda jednostruki gornji otvoreni navodnik `
Cvetana
Highlight
jednostruki gornji otvoreni navodnik `
Page 3: Gwc 2016

lation ’{peludo} is a characteristic of {abelha}’,meaning (’{furry} is a characteristic of {bee}’),which exists in the Portuguese WordNet, wouldnot be an adequate example, but the new relationwould be created based on the common rhetoricalfigure simile “as busy as a bee” in which case therelation would be’{busy} specific of {bee}’.

On the other hand, significant research, that thework described in this paper leans on, is depictedin papers by Veale and Hao (2008) and (2010),regarding the development of automatic methodsof extracting semantic knowledge out of examplesof the simile figures usage. We suggest extractionof linguistic constructs of the form as ADJ asa NOUN from the corpus annotated with PoS andlemmas, which means that, in contrast to the re-sults of Google search engine, the search wouldbe faster and more precise, because in one step,we would obtain the set of those potential fig-ures of simile that have only nouns positioned atthe end of the observed linguistic structure. Fur-thermore, if we do not take into account all ofthe attributes that are characteristic for a certainnoun, but only those that are used the most in ev-eryday language (measured by the frequency ofoccurrence of the corresponding figure simile inthe observed corpus) we would get the possibilityto describe the set of “noun-adjective” candidatesfor expansion of the existing structure of WordNetwith one unique relation (specificOf/specifiedBy).Introduction of a single relation would eliminatethe risk pointed out in (Veale and Hao, 2008) thatthe introduction of a large number of relations ex-pressed by the structure slot:filler would reducethe system’s ability to recognize similar proper-ties. In a case of one relation, for example, {frame:Has strut: proud} and {frame: Has gait: ma-jestic} would be transformed into {frame: spec-ifiedBy: proud} and {frame: specifiedBy: ma-jestic}. Apart from that, taking into accountonly the most frequent ones, the described trans-formation would not involve all of the slot:fillerstructures of a certain noun, but only the mostfrequent one, which would, in the case of thenoun peacock result in generating only one re-lation ’{peacock} specifiedBy {proud}’, and notall seven of them. If we introduce the frequencythreshold as a parameter, its change can affect thenumber of specificOf/specifiedBy relations for thesingle noun synset, as well as for the total numberof relations of that type.

With the suggested relation specifi-cOf/specifiedBy we can determine the natureof the semantic connection between the conceptsarrow, light and rabbit, which cannot be achievedwith the existing PWN relations. Namely, thesimile constructs brz kao zec “as fast as a rabbit”,brz kao svetlost “as fast as light”, brz kao strela“as fast as an arrow”, obtained by querying overthe Corpus of Contemporary Serbian, we canconfirm that ’{strela, svetlost, zec} specifiedBy{brz}’ i.e. ’{arrow, light, rabbit} specifiedBy{fast}’ holds true.

4 Language-independent model forWordNet Expansion

The procedure of expansion with the relationspecificOf/specifiedBy that we are proposing, willbe shown on the example of expansion of the Ser-bian WordNet (SWN) (Krstev, 2008), but it canalso be used for other wordnets. The procedureconsists of the following steps:

1) From the annotated corpus of a natural lan-guage Kl extract linguistic constructs of the formpridev kao imenica (in the case of Englishas ADJ as a NOUN) and create the set Simssuch that:

Sims={“as ADJ as a NOUN”}, sims∈ Sims ⊂ Kl

In our case, from the Corpus of ContemporarySerbian Language1 (Utvic, 2014) 59 concor-dances of the form “<as ADJ as a NOUN>” weregenerated, such as the following:

ri vise.-Kakva je?-<Bela kao mleko>. Ona trazi isto

crnog mrezastog sala, <lakog kao pero>, smele zelene dan

od zatvorenika; lica <zuta kao limun>, radosno polete

...............................-<White as milk>. ..............................

.................................., <light as a feather>, ......................

............................... <yellow as a lemon>, .........................

2) Eliminate all elements from the Simsset whose adjectives are not descriptive:SimsRedycByAdj={sims∈ Sims|ADJ ′is descriptive′}like in the following examples where the adjec-tives are possessive:

za taj dan. Jer rec je <ljudska kao glad>. Nema za

Drugog? Ljubav <majcinska kao vernost>, ljubav musko-

....................................... <human as hunger>. .................

........................... <motherly as loyalty>, ..........................

1http://www.korpus.matf.bg.ac.rs/index.html/

Cvetana
Highlight
treba otvoreni jednostruku navodnik `
Cvetana
Sticky Note
isti navodnik
Cvetana
Highlight
razmak posle be a onde jednostruki gornji otvoreni navodnik `
Cvetana
Highlight
jednostruki gornji otvoreni navodnik `
Cvetana
Highlight
jednostruki gornji otvoreni navodnik `
Cvetana
Highlight
jednostruki gornji otvoreni navodnik `
Page 4: Gwc 2016

In our case, the result was|SimsRedycByAdj| = 2030 elements.

3) From the set SimsRedycByAdj, eliminate allelements whose nouns are proper names, or havebeen replaced by acronyms (3rd example)SimsRedycByNoun = {sims ∈ SimsRedycByAdj|NOUN ′is a commonN ′}Like in the following examples:

Pljevlja bi bila bogata i <blestava kao Las> Vegas

da bude slavna i <bogata kao Monika> Seles. Kako

zatvoru u Beogradu, <opstepoznatom kao CZ>, naci u

............................................... <glistening as Las> Vegas

.............................. <rich as Monika>, Seles. ..................

............ ...................... <generally known as CZ>, ..........

In our case, the result was|SimsRedycByNoun| = 1059.

4) From the set SimsRedycByNoun generate asubset of the most frequent elementsSimsMostFreq = {sims ∈ SimsRedycByNoun|freq(sims) ≥ k}where k is the minimal frequency of occurrenceas ADJ as a NOUN in the observed corpusKl. In our case, for the value k = 1, the totalnumber of ADJ-NOUN pairs, candidates forwordnet expansion is |SimsMostFreq| = 1059.

5) From the set SimsMostFreq create a text fileAdjective As Noun with ADJ-NOUN pairs overwhich an algorithm for wordnet expansion is exe-cuted (see Algorithm).

The presented algorithm is used for sequentialprocessing of input candidate ADJ-NOUN pairs.For each pair, it checks whether in a given word-net there are synsets of adjectives and nouns whichare lexicalized by literals of the observed adjectiveand noun. After that, the procedure of automaticcreation of the relation specificOf/specifiedBy isimplemented between synsets of an adjective anda noun using a restriction — both of them have tobe lexicalized by only one literal whose sense isthe first sense. The first sense of a literal is consid-ered to be the sense of a word in a certain languagewhich is defined by a relevant dictionary or a cor-pus as the most commonly used one. Intuition onwhich this restriction is based is related to minimalpairing errors in the case when there are no syn-

AlgorithmInput: Adjective As Noun text fileOutput: 1. a pair of WordNet mutually inversesemantic relations (specificOf/specifiedBy)for each input adjective-noun pair2. file containing adjectives and all their senses3. file containing nouns and all their sensesforeach adjective-noun pair in adjective-noun pairsif ((adjective exists in Wordnet.adjective.literals)

and (noun exists in Wordnet.noun.literals)) {if ((Wordnet.senses(adjective).Count==1)

and (Wordnet.senses(noun).Count==1)and (Wordnet.sense(adjective).FirstSense)and (Wordnet.sense(noun).FirstSense) ) {Create Relation(specificOf,adjective,noun);Create Relation(specifiedBy,noun,adjective);

}elseforeach (sense in Wordnet.senses(adjective)) {

add to adjective senses(adjective,sense,synsetId)}foreach (sense in Wordnet.senses(noun)) {

add to noun senses(noun,sense,synsetId)}}

}

onyms in the observed synsets and the sense of theliterals is the first sense. In that case, the possibil-ity of error exists only if: at least one of the synsetsis not correctly complemented with synonyms andthere are no correctly assigned senses, or the de-sired sense is not the first one and it does not exist.In this regard, since the source of errors is knownin advance, it is possible to check it before apply-ing the algorithm. On the other hand, if at least oneof the synsets has more than one synonym, or hasone but its sense is not the first one, the new rela-tion is not created and adjective-noun pair is sep-arated into two independent files: the file contain-ing adjectives and all their senses from a wordnet(named adjective senses) and the file containingnouns and all their senses (named noun senses).These resources are later used in a web applica-tion for manual pairing of adjectives and nounsand their connection through the desired relation.Finally, pairs for which it is determined at the verybeginning of the process that they do not exist inthe form of literals in a given wordnet, becomecandidates for later regular wordnet expansion –by adding new synsets.

Prior to the implementation of the given algo-rithm, we examined the SWN in order to deter-mine its structure in terms of the previously de-scribed restrictions. SWN has more than 22,000synsets, contains 1660 synsets of adjectives withone sense, out of which in 1452 synsets that senseis the first sense, while the number of noun synsetswith one sense where that one is the first sense

Cvetana
Sticky Note
Neka Jelena kaže da li ovde treba član, nisam sigurna
Cvetana
Sticky Note
Ja ovde ne bih počinjala novi pasus
Cvetana
Highlight
with one literal and the sense of that literal is the first (Jelena da proveri, ali ovo što sotji sada je VEOMA pogrešno jer šta znači da "noun sysnset" ima ONE SENSE)
Cvetana
Highlight
Cvetana
Highlight
literal
Cvetana
Highlight
the sense of these literals is the first sense (Jelena da proveri formulaciju)
Page 5: Gwc 2016

is 15,035. By implementing the suggested algo-rithm, out of a total of 1059 ADJ-NOUN pairs,69 pairs were found which are “pairs whose bothmembers have one sense and that sense is the firstsense”. In SWN there are 302 ADJ-NOUN pairsin which there is more than one sense or that senseis not the first sense. The 688 pairs that are leftpertain to those cases when at least one memberof the ADJ-NOUN pair does not exist as a literalin SWN. Therefore, using the proposed methodproduces 372 candidates that can be connected inSWN by the relation specificOf/specifiedBy afterapproval.

For 302 ADJ-NOUN pairs present in SWN, butwith many senses or with one sense that is not thefirst sense, a web page is created in the SWNE2

application (Mladenovic et al., 2014) which al-lows users to input adjectives, thus generating acolumn with synsets lexicalized by the given ad-jective, while inputting nouns leads to generatingof the second column, with synsets lexicalized bythe noun at hand. New relations can be generatedby looking for appropriate synsets and senses inadjective senses and noun senses files as well asby chosing the desired relation from the third col-umn.

5 Evaluation

In order to assess whether the frequency of occur-rence is a valid parameter for finding ADJ-NOUNpairs which are parts of similes that are used in ev-eryday life, we used an online survey which wascarried out through Google Forms. Comparing thelist (marked here as List1) which was automati-cally generated using the Corpus and filtered usingsteps 1-4 explained in Section 4, and ordered in adecreasing order according to pair frequency, withthe list which, in fact, represents a subset of theList1 of those pairs that were marked positivelyin the anonymized survey (marked as List2), wewanted to assess which frequency threshold valueentails the results obtained in the survey.

The survey itself was conducted over the timeperiod of 5 days, such that a total of 4 forms werepublished successively. Anonymous users of thesocial network Facebook were supposed to give ananswer to each question generated on the basis ofADJ-NOUN pairs from the List1 list with a goal offinding out whether “in everyday language we cansay that someone/something is ADJ as NOUN?”.

2http://resursi.mmiljana.com/

The answers were Yes or No and answering allquestions in a form was mandatory. The Table 1gives an overview of the distribution of questionsin each form as well as the number of participantswho were involved in answering the questions.

Google Number of Participantsform questions per

per form form1 30 462 42 1383 41 1504 41 100

Total 154 434

Table 1: Distribution of questions and participantsper form.

A Phd student at the Faculty of Philology, asa linguistic expert, manually selected 154 itemsfrom List1 for which it could be presumed withsome degree of certainty that they may be usedin everyday language; namely we retrieved a lotof noisy data from the Corpus, and some itemsstopped carrying meaning when taken out of thecontext. Linguistic constructs, chosen from thegiven List1, included cist kao apoteka “clean asa farmacy”; cist kao suza “pure as a teardrop”;hladan kao led “cold as ice”; lak kao pero, “aslight as a feather”; veran kao pas “as faithful as adog” whereas constructs such as: dobar kao ob-lik “good as shape”; dobar kao pisac “good as awriter”; poznat kao vodja “famous as a leader”were not used as they represented occasional oc-currences. As we could not predict how willingto help out the potential participants would be, wewere aiming for at least 30 participants. Also, thefirst form had less constructs than the rest — 30 —as we wanted to test the method and to see whatwould be an optimal number of fields in the form.We obviously wanted to test as many constructs aspossible, but had also to keep the forms interestingand easy to fill in. The rest of the forms were bal-anced unit-wise. The number of participants wasnot pre-chosen, it depended on the turnout on theparticular day.

The problem with this kind of participant in-volvement and with posts on Facebook in generalis that the novelty wears off fast and if some postis very popular today, it might not be popular at alltomorrow. The call for participation in this projectdid receive a lot of attention in the first few hoursafter being posted on Facebook. The privacy forthe post was set to Public, which meant that ev-

Cvetana
Highlight
namely, (zarez iza namely)
Page 6: Gwc 2016

eryone could participate and share the link leadingto the Google Forms. Due to the fact that peopledid share the link, and some of their friends didthe same thing, we could see that the forms werebeing filled in quickly and that our research wasgetting a lot of attention. In the following threedays, we posted another three forms on the sameURL address (precisely because the post receiveda lot of attention and shares) and we were able toget enough responses in order to get valid results.On the fourth day, the novelty wore off and wewere getting significantly fewer responses, whichonly proved our assumption that we had to movefast and to post new forms every day.

First, we measured the contribution of partici-pants and determined the set of those participantswhose results were to be taken into account as rel-evant, on the basis that there was no substantialdifference between arithmetic means of their an-swers. In order to measure the participants’ con-tribution we generated 7 subsets of questions andanswers where each set had less than 30 ques-tions (units) using four spreadsheets containingparticipants’ answers, as it is shown in Table 2(each Google Form, except the first one, was di-vided into two parts). All 7 units were convertedinto matrices where each row represented answersof each participant and each column representedone question in the form <adjective>as<noun>.Content of each cell of the matrix had the value 1if the participant marked a certain expression with“Yes” and the value 0 if the participant marked thatexpression with “No”. Rows of the matrix werecompared against each other with a paired t-testin order to determine that there was no substan-tial difference between arithmetic means of par-ticipants’ answers. From each set we selected,among all participants belonging to that set, fiveparticipants whose difference in the paired t-testwas the slightest.

After that, inter-annotator (participant) agree-ment was evaluated using the Krippendorffα coef-ficient (Kalpha). When the value of α is in the [0,1] interval, it represents the agreement level whichranges from complete disagreement, when α = 0,to complete agreement, when α = 1. The α mea-sure can also have a negative value, up to -1, whentwo mistakes are present: mistake in sampling andmistake in systemic disagreement. Considering anacceptable level of reliability, the works of (Hayesand Krippendorff, 2007), (Lombard et al., 2002)

and (Maggetti, 2013) show that agreements whosevalues are α ≥ 0.667 are reliable, and that agree-ments whose values are α ≥ 0.8 can be consid-ered very reliable. The results we obtained usingthe Kalpha test over the set of 5 annotators foreach of the subsets of the forms is given in Ta-ble 2. Provided that for the first two forms and a

Form No of No of Kalpha No ofset parti- ques- value quest.

cipants tions annot.with Yes

1 5 30 α = 0.757∗ 162a 5 21 α = 0.713∗ 172b 5 21 α = 0.698∗ 153a 5 21 α = 0.688∗ 53b 5 20 α = 0.4844a 5 21 α = 0.4344b 5 19 α = 0.375

Total 154 53

Table 2: Inter-annotator agreement over GoogleForms and number of items which belong to reli-able forms and were annotated with “Yes”.

part of the third one, the value of Kalpha was suchthat the annotator agreement could be consideredreliable, for all of the constructs in those forms,if a majority of annotators (3 or more than 3 outof 5) annotated a certain question with “Yes”, thatitem was taken as an element of the List2’. Thus,we obtained 53 items in total and their distributionover form sets is given in the last column of Ta-ble 2. Furthermore, we want to draw attention tothe phenomenon which we did not study in depth,which was described here in Table 2 and has to dowith the decline of the Kalpha coefficient over thesame questionnaire structure, related to the timeperiod when the participants filled in the GoogleForms.

Finally, we wanted to assess how much thechange of the frequency threshold influenced therelevance of automatically selected ADJ-NOUNpairs, measured based on the results obtainedthrough the surveys. The list List1 has beenreduced so that it contains forms 1, 2a, 2b and 3awhich amounted to 93 elements, that is to say,all ADJ-NOUN pairs for which evaluation bythe participants was proved relevant. That listwas named List1’. In contrast, the list namedList2’ contained only those ADJ-NOUN pairsfrom the List1’ that were marked positively.First, we wanted to set the frequency thresholdto k = 4, which meant that the algorithm wasused to process only those pairs whose frequency

Page 7: Gwc 2016

of occurrence in the Corpus was k ≥ 4. Therewere 23 such pairs in the list List1’. Out ofthose 23, 19 pairs were present in the list List2’,which meant that the participants in the surveydid not recognize 4 pairs that were recognizedby the algorithm. The entire statistics showingthe percentage of pairs we obtained using thealgorithm as well as human judgement is givenin Table 3, and the graph showing the relationbetween human selection, as opposed to automaticselection, when the frequency threshold is beingchanged, is given in Figure 1.

Frequency by by humans /threshold algorithm humans algorithmk = 1 93 53 57%k = 2 44 32 73%k = 3 32 27 84%k = 4 23 19 83%

Table 3: Relationship of manually and automat-ically selected pairs depending on the frequencythreshold.

Figure 1: Relationship of selected pairs obtainedwith the survey method compared to the ones ob-tained with the method of the most frequent occur-rence for different frequency thresholds.

Figure 1 shows the way in which, on a sam-ple of 93 ADJ-NOUN pairs contained in the List2’list (Kalpha reliable), the percentage of participa-tion of the manually selected pairs changes in thesubset obtained by choosing only those pairs fromthe same list whose frequency was equal or higherthan the set threshold, when the threshold changes.The achieved result of 84% gives us the manuallymeasured accuracy of the Algorithm for automaticWordNet expansion with the frequency thresholdof k=3.

6 Conclusions

In this work, we presented a general way of au-tomatic expansion of a WordNet with the seman-

tic relation specificOf/specifiedBy which was pro-duced after extraction of semantic knowledge con-tained in the relation of comparison from the anno-tated corpus. The results of the proposed methodof selection of the most frequent ADJ-NOUNpairs extracted from the described linguistic con-structs as ADJ as a NOUN for the frequencythreshold k ≥ 3 were matched in 84% of caseswith the results obtained from anonymous evalu-ators, on identical sets of ADJ-NOUN pairs. TheAlgorithm for automatic WordNet expansion canbe improved in step 5) by including the Wordsense disambiguation (WSD) method. That wouldenable literals with more than one sense to be usedin automatic adding of the new relation. In futurework we plan to implement WSD and to use otherlinguistic constructs which indicate Simile.

Using the relation specificOf/specifiedBy be-tween a noun and its specific adjective, the hiddenmeaning of another word or a phrase can be de-tected, e.g. in sentences such as “My sister is likea bee” or “My sister is a bee”, based on the rela-tion specificOf/specifiedBy between the noun beeand its specific adjective busy, a sentiment neutralnoun sister can have the same sentiment polarityas the adjective busy, i.e. positive polarity. If wesay “My sister is like a lizard”, based on the sameprinciple, the same noun changes its sentiment po-larity into negative polarity, considering the factthat the noun lizard is connected with a relationspecifiedBy with the adjective lazy. In the exam-ple “My sister is as fast as a turtle” the indirectconnection of the antonymous pair fast-slow in theconstruct “as fast as a turtle” indicates the exis-tence of the rhetorical figure irony, therefore, in agiven context, the noun sister can have a negativesentiment polarity. In our future work, we plan onanalysing whether the process of sentiment classi-fication can be improved by changing the defaultsentiment polarity of n-gram predictors, depend-ing on the figurative context detected in the previ-ously described way.

ReferencesAdam Pease, John Li, and Karen Nomorosa 2012.

WordNet and SUMO for Sentiment Analysis. Pro-ceedings of the 6th International Global WordnetConference (GWC2012).

Alexandre Rademaker, Valeria de Paiva, Gerard deMelo, Livy Maria Real Coelho, and Maira Gatti.2014. OpenWordNet-PT: A Project Report. Pro-

Cvetana
Highlight
Ostao je redosled po ličnom imenu, to zaista nisam nikada videla i ne znam kako ste ga uopste dobili.
Page 8: Gwc 2016

ceedings of the 7th Global WordNet Conference(GWC2014), 383–390.

Alistair Kennedy and Diana Inkpen. 2006. Senti-ment Classification of Movie Reviews Using Con-textual Valence Shifters. Computational Intelli-gence, 22(2):110–125.

Andrea Esuli and Fabrizio Sebastiani. 2006. Sen-tiwordnet: A Publicly Available Lexical Resourcefor Opinion Mining. Proceedings of the 5th Confer-ence on Language Resources and Evaluation (LREC2006), 417–422.

Andrew F. Hayes and Klaus Krippendorff. 2007. An-swering the Call for a Standard Reliability Mea-sure for Coding Data. Communication Methods andMeasures , 1(1):77–89.

Antonio Reyes and Paolo Rosso. 2012. Making objec-tive decisions from subjective data: Detecting ironyin customer reviews. Decision Support Systems,53(4):754–760.

Carlo Strapparava and Alessandro Valitutti. 2004.Wordnet-affect: An Affective Extension of Wordnet.Proceedings of the 4th International Conference onLanguage Resources and Evaluation (LREC 2004),1083–1086.

Cvetana Krstev. 2008. Processing of Serbian - Au-tomata, Texts and Electronic dictionaries. Faculty ofPhilology, University of Belgrade, Belgrade.

Judit Kuti, Karoly Varasdi, Agnes Gyarmati, and PeterVajda. 2008. Language Independent and LanguageDependent Innovations in the Hungarian WordNet.Proceedings of the 4th International Global WordnetConference (GWC2008), 254–269.

Manuela Angioni, Roberto Demontis, Massimo Deriu,and Franco Tuveri. 2008. Semanticnet: a WordNet-based Tool for the Navigation of Semantic Informa-tion. Proceedings of the 4th International GlobalWordnet Conference (GWC2008), 21–34.

Marek Maziarz, Stanisław Szpakowicz, and Maciej Pi-asecki. 2012. Semantic Relations among Adjectivesin Polish WordNet 2.0: A New Relation Set, Dis-cussion and Evaluation. Cognitive Studies / EtudesCognitives, 12:149–179.

Martino Maggetti. 2013. Regulation in Practice: Thede facto Independence of Regulatory. Swiss Politi-cal Science Review, 19(1):111–113.

Matthew Lombard, Jennifer Snyder-Duch and CherylCampanella Bracken. 2002. Content analysis inmass communication: Assessment and reporting ofintercoder reliability. Human Communication Re-search, 28(4):587–604.

Miljana Mladenovic and Jelena Mitrovic. 2013. On-tology of rhetorical figures for Serbian. LNAI,Springer, 8082:386–393.

Miljana Mladenovic, Jelena Mitrovic and CvetanaKrstev. 2014. Developing and Maintaining aWordNet: Procedures and Tools. Proceedings ofthe 7th International Global Wordnet Conference(GWC2014), 55–62.

Milos Utvic. 2014. Liste ucestanosti Korpusa savre-menog srpskog jezika [Corpus of ContemporarySerbian – Frequency Lists]. Naucni sastanak slav-ista u Vukove dane, 241–262. Faculty of Philology,University of Belgrade, Belgrade.

Palmira Marrafa, Raquel Amaro, Rui Pedro Chaves,Susana Lourosa, Catarina Martins, and SaraMendes. 2006. WordNet.PT new directions. Pro-ceedings of the 3th International Global WordnetConference (GWC2006), 319–321.

Sara Mendes. 2006. Adjectives in WordNet. Proceed-ings of the 3th International Global Wordnet Confer-ence (GWC2006), 225–230.

Stefano Baccianella, Andrea Esuli and Fabrizio Se-bastiani. 2010. SentiWordNet 3.0: An EnhancedLexical Resource for Sentiment Analysis and Opin-ion Mining. Proceedings of the 7th Language Re-sources and Evaluation Conference (LREC 2010),2200–2204.

Svetla Koeva, Cvetana Krstev, and Dusko Vitas. 2008.Morpho-semantic Relations in WordNet. A CaseStudy for two Slavic Languages. Proceedings ofthe 4th International Global Wordnet Conference(GWC2008), 239–253.

Tony Veale and Yanfen Hao. 2008. Enriching Word-Net with folk knowledge and stereotypes. Proceed-ings of the 4th International Global Wordnet Confer-ence (GWC2008), 453–461.

Vassiliki Rentoumi, Stefanos Petrakis, Manfred Klen-ner, George A. Vouros, and Vangelis Karkaletsis.2010. United we stand - improving sentiment analy-sis by joining machine learning and rule based meth-ods. Proceedings of the 7th Language Resources andEvaluation Conference (LREC 2010).

Yanfen Hao and Tony Veale. 2010. An Ironic Fist ina Velvet Glove: Creative Mis-Representation in theConstruction of Ironic Similes. Journal Minds andMachines, 20(4):635–650.