SemAxis: A Lightweight Framework to Characterize Domain ... · A higher score means that the word...

Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Long Papers), pages 2450–2461Melbourne, Australia, July 15 - 20, 2018. c©2018 Association for Computational Linguistics

2450

SemAxis: A Lightweight Framework to Characterize Domain-SpecificWord Semantics Beyond Sentiment

Jisun An† Haewoon Kwak† Yong-Yeol Ahn§†Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar

§Indiana University, Bloomington, IN, [email protected] [email protected] [email protected]

Abstract

Because word semantics can substantiallychange across communities and contexts,capturing domain-specific word semanticsis an important challenge. Here, we pro-pose SEMAXIS, a simple yet powerfulframework to characterize word seman-tics using many semantic axes in word-vector spaces beyond sentiment. Wedemonstrate that SEMAXIS can capturenuanced semantic representations in mul-tiple online communities. We also showthat, when the sentiment axis is examined,SEMAXIS outperforms the state-of-the-art approaches in building domain-specificsentiment lexicons.

1 Introduction

In lexicon-based text analysis, a common, tacit as-sumption is that the meaning of each word doesnot change significantly across contexts. This ap-proximation, however, falls short because contextcan strongly alter the meaning of words (Fischer,1958; Eckert and McConnell-Ginet, 2013; Hovy,2015; Hamilton et al., 2016b). For instance, theword kill may be used much more positively in thecontext of video games than it would be in a newsstory; the word soft may be used much more nega-tively in the context of sports than it is in the con-text of toy animals (Hamilton et al., 2016a). Thus,lexicon-based analysis exhibits a clear limitationwhen two groups with strongly dissimilar lexicalcontexts are compared.

Recent breakthroughs in vector-space repre-sentation, such as word2vec (Mikolov et al.,2013b), provide new opportunities to tackle thischallenge of context-dependence, because in theseapproaches, the representation of each word islearned from its context. For instance, a re-

cent study shows that a propagation method onthe vector space embedding can infer context-dependent sentiment values of words (Hamiltonet al., 2016a). Yet, it remains to be seen whether itis possible to generalize this idea to general wordsemantics other than sentiment.

In this work, we propose SEMAXIS, alightweight framework to characterize domain-specific word semantics beyond sentiment. SE-MAXIS characterizes word semantics with re-spect to many semantic perspectives in a domain-specific word-vector space. To systematically dis-cover the manifold of word semantics, we induce732 semantic axes based on the antonym pairsfrom ConceptNet (Speer et al., 2017). We wouldlike to emphasize that, although some of the in-duced axes can be considered as an extended ver-sion of sentiment analysis, such as an axis of ‘re-spectful’ (positive) and ‘disrespectful’ (negative),some cannot be mapped to a positive and negativerelationship, such as ‘exogeneous’ and ‘endoge-neous,’ and ‘loose’ and ‘monogamous.’ Based onthis rich set of semantic axes, SEMAXIS capturesnuanced semantic representations across corpora.The key contributions of this paper are:• We propose a general framework to charac-

terize the manifold of domain-specific wordsemantics.• We systematically identify semantic axes

based on the antonym pairs in ConceptNet.• We demonstrate that SEMAXIS can capture

semantic differences between two corpora.• We provide a systematic evaluation in

comparison to the state-of-the-art, domain-specific sentiment lexicon constructionmethodologies.

Although the idea of defining a semantic axisand assessing the meaning of a word with a vec-tor projection is not new, it has not been demon-strated that this simple method can effectively in-

2451

duce context-aware semantic lexicons. All of theinferred lexicons along with code for SEMAXIS

and all methods evaluated are made available inthe SEMAXIS package released with this paper1.

2 Related Work

For decades, researchers have been developingcomputational techniques for text analysis, includ-ing: sentiment analysis (Pang and Lee, 2004),stance detection (Biber and Finegan, 1988), pointof view (Wiebe, 1994), and opinion mining (Pangand Lee, 2004). The practice of creating and shar-ing large-scale annotated lexicons also acceleratedthe research (Stone et al., 1966; Bradley and Lang,1999; Pennebaker et al., 2001; Dodds et al., 2011;Mohammad et al., 2016, 2017).

These approaches can be roughly groupedinto two major categories: lexicon-based ap-proach (Turney, 2002; Taboada et al., 2011) andclassification-based approach (Pang et al., 2002;Kim, 2014; Socher et al., 2013). Althoughthe recent advancement of neural networks in-creases the potential of the latter approach, theformer has been widely used for its simplicityand transparency. ANEW (Bradley and Lang,1999), LIWC (Pennebaker et al., 2001), SO-CAL (Taboada et al., 2011), SentiWordNet (Esuliand Sebastiani, 2006), and LabMT (Dodds et al.,2011) are well-known lexicons.

A clear limitation of the lexicon-based approachis that it overlooks the context-dependent seman-tic changes. Scholars reported that the meaning ofa word can be altered by context, such as com-munities (Yang and Eisenstein, 2015; Hamiltonet al., 2016a), diachronic changes (Hamilton et al.,2016b) or demographic (Rosenthal and McKe-own, 2011; Eckert and McConnell-Ginet, 2013;Green, 2002; Hovy, 2015), geography (Trudg-ill, 1974), political and cultural attitudes (Fischer,1958) and personal variations (Yang and Eisen-stein, 2017). Recently, a few studies have shownthe importance of taking such context into ac-counts. For example, Hovy et al. (2015) showedthat, by including author demographics such asage and gender, the accuracy of sentiment anal-ysis and topic classification can be improved. Itwas also shown that, without domain-specific lex-icons, the performance of sentiment analysis canbe significantly degraded (Hamilton et al., 2016a).

Building domain-specific sentiment lexicons1https://github.com/ghdi6758/SemAxis

through human input (crowdsourcing or experts)requires not only significant resources but alsocareful control of biases (Dodds et al., 2011; Mo-hammad and Turney, 2010). The challenge isexacerbated because ‘context’ is difficult to con-cretely operationalize and there can be numer-ous contexts of interest. For resource-scarce lan-guages, such problems become even more crit-ical (Hong et al., 2013). Automatically build-ing lexicons from web-scale resources (Velikovichet al., 2010; Tang et al., 2014) may solve thisproblem but poses a severe risk of unintended bi-ases (Loughran and McDonald, 2011).

Inducing domain-specific lexicons from the un-labeled corpora reduces the cost of dictionarybuilding (Hatzivassiloglou and McKeown, 1997;Rothe et al., 2016). Although earlier researchutilize syntactic (grammatical) structures (Hatzi-vassiloglou and McKeown, 1997; Widdows andDorow, 2002), the approach of learning word-vector representations has gained a lot of momen-tum (Rothe et al., 2016; Hamilton et al., 2016a;Velikovich et al., 2010; Fast et al., 2016).

The most relevant work is SENTPROP (Hamil-ton et al., 2016a), which constructs domain-specific sentiment lexicons using graph propaga-tion techniques (Velikovich et al., 2010; Rao andRavichandran, 2009). In contrast to SENTPROP’ssentiment-focused approach, we provide a frame-work to understand the semantics of words withrespect to 732 semantic axes based on Concept-Net (Speer et al., 2017).

3 SEMAXIS Framework

Our framework, SEMAXIS, involves three steps:constructing a word embedding, defining a se-mantic axis, and mapping words onto the seman-tic axis. Although they seem straightforward, thecomplexities and challenges in each step can addup. In particular, we tackle the issues of treatingsmall corpora and selecting pole words.

3.1 The Basics of SEMAXIS

3.1.1 Building word embeddingsThe first step in our approach is to obtain wordvectors from a given corpus. In principle, any stan-dard method, such as Positive Pointwise MutualInformation (PPMI), Singular-Value Decomposi-tion (SVD), or word2vec, can be used. Here, weuse the word2vec model because word2vec iseasier to train and is known to be more robust than

https://github.com/ghdi6758/SemAxis

2452

competing methods (Levy et al., 2015).

3.1.2 Defining a semantic axis and computingthe semantic axis vector

A semantic axis is defined by the vector betweentwo sets of ‘pole’ words that are antonymous toeach other. For instance, a sentiment axis can bedefined by a set of the most positive words on oneend and the most negative words on the other end.Similarly, any antonymous word pair can be usedto define a semantic axis (e.g., ‘clean’ vs. ‘dirty’or ‘respectful’ vs. ‘disrespectful’).

Once two sets of pole words for the correspond-ing axis are chosen, we compute the average vec-tor for each pole. Then, by subtracting one vec-tor from the other, we obtain the semantic axisvector that encodes the antonymous relationshipbetween two sets of words. More formally, letS+={v+1 , v

+2 , ..., v

+n } and S−={v−1 , v

−2 , ..., v

−m}

be two sets of pole word vectors that have anantonym relationship. Then, the average vectorsfor each set are computed as V+= 1

n

∑n1 v

+i and

V−= 1m

∑m1 v−j . From the two average vectors,

the semantic axis vector, Vaxis (from S− to S+),can be defined as:

Vaxis = V+ −V− (1)

3.1.3 Projecting words onto a semantic axisOnce the semantic axis vector is obtained, we cancompute the cosine similarity between the axisvector and a word vector. The resulting cosinesimilarity captures how closely the word is alignedto the semantic axis. Given a word vector vw fora word w, the score of the word w along with thegiven semantic axis, Vaxis, is computed as:

score(w)Vaxis= cos(vw,Vaxis)

=vw ·Vaxis

‖ vw ‖‖ Vaxis ‖(2)

A higher score means that the word w is moreclosely aligned to S+ than S−. When Vaxis isa sentiment axis, a higher score is corresponds tomore positive sentiment.

3.2 SEMAXIS for Comparative Text Analysis

Although aforementioned steps are conceptuallysimple, there are two practical challenges: 1) deal-ing with small corpus and 2) finding good polewords for building a semantic axis.

3.2.1 Semantic relations encoded in wordembeddings

Since semantic relations are particularly impor-tant in our method, we need to ensure that ourword embedding maintains general semantic re-lations. This can be evaluated by analogy tasks.In particular, we use the Google analogy testdataset (Mikolov et al., 2013a), which contains19,544 questions — 8,869 semantic and 10,675syntactic questions — in 14 relation types, suchas capital-world pairs and opposite relationships.

3.2.2 Dealing with small corpusAs in other machine learning tasks, the amount ofdata critically influences the performance of wordembedding methods. However, the corpora of ourinterest are often too small to facilitate the learningof rich semantic relationships therein. To mitigatethis issue, we propose to pre-train a word embed-ding using a background corpus and update withthe target corpora. In doing so, we capture the se-mantic changes while maintaining general seman-tic relations offered by the large reference model.

The vector-space embedding drifts from the ref-erence model as we train with the target corpus. Iftrained too much with the smaller target corpus,it will lose the ‘good’ initial embedding from thehuge reference corpus. If trained too little, it willnot be able to capture context-dependent semanticchanges. Our goal is thus to minimize the loss ingeneral semantic relations while maximizing thecharacteristic semantic relations in the new texts.

Consider a corpus of our interest C and a ref-erence corpus R. The model M is pre-trained onR, and then we start training it on C. We use thesuperscript e to represent the e-th epoch of train-ing. That is, M e

C is the model after the e-th epochtrained on C. Then, we evaluate the model regard-ing two aspects: general semantic relations andcontext-dependent semantic relations. The formeris measured by the overall accuracy of the anal-ogy test (Mikolov et al., 2013a). The latter ismeasured by tracking the semantic changes of thetop k words in the given corpus C. The semanticchanges of the words are measured by the changesin their scores, ∆, on a certain axis; for instance, asentiment axis, between consecutive epochs. Westop learning when two conditions are satisfied:(1) When the accuracy of the analogy test drops byα; and (2) When ∆ is lower than β. In principle,the model can be updated with the target corpus aslong as the accuracy does not drop. We then use β

2453

to control the epochs. When ∆ is low, the gain byupdating the model becomes negligible comparedto the cost and thus we can stop updating.

3.3 Identifying rich semantic axes forSEMAXIS

The primary advantage of SEMAXIS is that it canbe easily extended to examine diverse perspectivesof texts as it is lightweight. Although the axis canbe defined by any pair of words in principle, wepropose a systematic way to define the axes.

3.3.1 732 Pre-defined Semantic AxesWe begin with a pair of antonyms, called initialpole words. For instance, a sentiment axis, whichis a basis of sentiment analyses, can be defined bya pair of sentiment antonyms, such as ‘good’ and‘bad.’

To build a comprehensive set of initial polewords, we compile a list of antonyms from Con-ceptNet 5.5, which is the latest release of a knowl-edge graph among concepts (Speer et al., 2017).We extract all the antonym concepts marked as‘/r/Antonym’ edges. Then, we filter out non-English concepts and multi-word concepts. In ad-dition, we eliminate duplicated antonyms that in-volve synonyms. For instance, only one of the(empower, prohibit) and (empower, forbid) needsto be kept because ‘prohibit’ and ‘forbid’ are syn-onyms, marked as ‘/r/Synonym’ in ConceptNet.

To further refine the antonym pairs, we cre-ate a crowdsourcing task on Figure Eight, for-merly known as CrowdFlower. Specifically, weask crowdworkers: Do these two words have op-posite meanings? We include those word pairs thata majority of crowdworkers agree to have an op-posite meaning. The word pairs that the majorityof crowdsource workers disagree were mostly er-roneous antonym pairs, such as ‘5’ and ‘3’, and‘have’ and ‘has.’ We then filter out the antonymsthat are highly similar to each other. For example,(‘advisedly’ and ‘accidentally’) and (‘purposely’and ‘accidentally’) show the cosine similarity of0.5148, while ‘advisedly’ and ‘purposely’ are notmarked as synonyms in ConceptNet. Althoughwe use the threshold of 0.4 in this work, a differ-ent threshold can be chosen depending on the pur-pose. Finally, we eliminate concepts that do notappear in the pre-trained Google News 100B wordembeddings. As a result, we obtain 732 pairs ofantonyms. Each pair of antonyms becomes initialpole words to define one of the diverse axes for

SEMAXIS.We assess the semantic diversity of the axes by

computing cosine similarity between every possi-ble pair of the axes. The absolute mean value ofthe cosine similarity is 0.062, and the standard de-viation is 0.050. These low cosine similarity andstandard deviation values indicate that the chosenaxes have a variety of directions, covering diverseand distinct semantics.

3.3.2 Augmenting pole wordsWe then expand the two initial pole words to largersets of pole words, called expanded pole words, toobtain more robust results. If we use only two ini-tial pole words to define the corresponding axis,the result will be sensitive to the choice of thosewords. Since the initial pole words are not nec-essarily the best combinations possible, we wouldlike to augment it so that it is more robust to thechoice of the initial pole words.

To address this issue, we find the l closest wordsof each initial pole word in the word embedding.We then compute the geometric center (averagevector) of l+1 words (including the initial poleword) and regard it as the vector representationof that pole of the axis. For instance, refining anaxis representing a ‘good’ and ‘bad’ relation, wefirst find the l closest words for each of ‘good’and ‘bad’ and then compute the geometric centerof them. The newly computed geometric centersthen become both ends of the axis representing a‘good’ and ‘bad’ relation. We demonstrate howthis approach improves the explanatory power ofan axis describing a corresponding antonym rela-tion in Section 4.3.

4 SEMAXIS Validation

In this section, we quantitatively evaluate our ap-proach using the ground-truth data and by com-paring our method against the standard baselinesand state-of-the-art approaches. We reproducethe evaluation task introduced by Hamilton et al.(2016a), recreating Standard English and Twittersentiment lexicons for evaluation. We then com-pare the accuracy of sentiment classification withthree other methods that generate domain-specificsentiment lexicons.

It is worth noting that we validate SEMAXIS

based on a sentiment axis mainly due to the avail-ability of the well-established ground-truth dataand evaluation process. Nevertheless, as the sen-timent axis in the SEMAXIS framework is not

2454

specifically or manually designed but establishedbased on the corresponding pole words, the vali-dation based on the sentiment axis can be gener-alized to other axes that are similarly establishedbased on other corresponding pole words.Standard English: We use well-known GeneralInquirer lexicon (Stone et al., 1966) and contin-uous valence scores collected by Warriner et al.(2013) to evaluate the performance of SEMAXIS

compared to other state-of-the-art methods. Wetest all of the methods by using the off-the-shelfGoogle news embedding constructed from 1011

tokens (Google, 2013).Twitter: We evaluate our approach with the testdataset from the 2015 SemEval task 10E competi-tion (Rosenthal et al., 2015) using the embeddingconstructed by Rothe et al. (2016).

Domain Positive pole words Negative pole words

Standard good, lovely, excellent,fortunate, pleasant, de-lightful, perfect, loved,love, happy

bad, horrible, poor,unfortunate, unpleas-ant, disgusting, evil,hated, hate, unhappy

Twitter love, loved, loves,awesome, nice, amaz-ing, best, fantastic,correct, happy

hate, hated, hates,terrible, nasty, aw-ful, worst, horrible,wrong, sad

Table 1: Manually selected pole words used forthe evaluation task in (Hamilton et al., 2016a).These pole words are called seed words in (Hamil-ton et al., 2016a)

4.1 Evaluation SetupWe compare our method against state-of-the-artapproaches that generate domain-specific senti-ment lexicons.

State-of-the-art approaches: Our baselinefor the standard English is a WordNet-basedmethod, which performs label propagation over aWordNet-derived graph (San Vicente et al., 2014).For Twitter, we use Sentiment140, a distantly su-pervised approach that uses signals from emoti-cons (Mohammad and Turney, 2010). Moreover,on both datasets, we compare against two state-of-the-art sentiment induction methods: DENSIFIER,a method that learns orthogonal transformations ofword vectors (Rothe et al., 2016), and SENTPROP,a method with a label propagation approach onword embeddings (Hamilton et al., 2016a). Seedwords, which are called pole words in our work,are listed in Table 1.

Evaluation metrics: We evaluate the aforemen-tioned approaches according to (i) their binaryclassification accuracy (positive and negative), (ii)ternary classification performance (positive, neu-tral, and negative), and (iii) Kendall τ rank-correlation with continuous human-annotated po-larity scores. Since all methods result in senti-ment scores of words rather than assigning a classof sentiment, we label words as positive, neu-tral, or negative using the class-mass normaliza-tion method (Zhu et al., 2003). This normaliza-tion uses knowledge of the label distribution of atest dataset and simply assigns labels to best matchthis distribution. For the implementation of othermethods, we directly use the source code with-out any modification or tuning (SocialSent, 2016)used in (Hamilton et al., 2016a).

4.2 Evaluation ResultsTable 2 summarizes the performance. Surpris-ingly, SEMAXIS — the simplest approach — out-performs others on both Standard English andTwitter datasets across all measures.

Standard English

Method AUC Ternary F1 Tau

SEMAXIS 92.2 61.0 0.48DENSIFIER 91.0 58.2 0.46SENTPROP 88.4 56.1 0.41WordNet 89.5 58.7 0.34

Twitter

Method AUC Ternary F1 Tau

SEMAXIS 90.0 59.2 0.57DENSIFIER 88.5 58.8 0.55SENTPROP 85.0 58.2 0.50

Sentiment140 86.2 57.7 0.51

Table 2: Evaluation results. Our method performsbest on both Standard English and Twitter.

4.3 Sensitivity to Pole WordsAs discussed in Section 3.3.2, because the axes arederived from pole words, the choice of the polewords can significantly affect the performance.We compare the robustness of three methods forselecting pole words: 1) using sentiment lexicons;2) using two pole words only (initial pole words);and 3) using l closest words on the word2vecmodel as well as the two initial pole words (ex-panded pole words). For the first, we choose twosets of pole words that have the highest scores andthe lowest scores in two widely used sentiment

2455

lexicons, ANEW (Bradley and Lang, 1999) andLabMT (Dodds et al., 2011). Then, for the twopole words, we match 1-of-10 positive pole wordsand 1-of-10 negative pole words in Table 1, result-ing in 100 pairs of pole words. For these 100 pairs,in addition to the two initial pole words, we thenuse the l closest words (l = 10) of each of them toevaluate the third method.

We compare these three methods by quantifyinghow well SEMAXIS performs for the evaluationtask. The average AUC for the two pole wordsmethod is 78.2. We find that one of the 100 pairs— ‘good’ and ‘bad’ — shows the highest AUC(92.4). However, another random pair (‘happy’and ‘evil’) results in the worst performance withthe AUC of 67.2. In other words, an axis de-fined by only two pole words is highly sensitiveto the choice of the word pair. By contrast, whenan axis is defined by aggregating l closest words,the average AUC increases to 80.6 (the minimumperformance is above 71.2). Finally, using pre-established sentiment lexicons results in the worstperformance (the AUC of 77.8 for ANEW and67.5 for LabMT). These results show that identify-ing an axis is a crucial step in SEMAXIS, and usingl closest words in addition to initial pole words isa more robust method to define the axis.

5 SEMAXIS in the Wild

We now demonstrate how SEMAXIS can be usedin comparative text analysis to capture nuancedlinguistic representations beyond the sentiment.As an example, we use Reddit (Reddit, 2005), oneof the most popular online communities. Redditis known to serve diverse sub-communities withdifferent linguistic styles (Zhang et al., 2017). Wefocus on a few pairs of subreddits that are knownto express different views. We also choose them tocapture a wide range of topics from politics to reli-gion, entertainment, and daily life to demonstratethe broad applicability of SEMAXIS.

5.1 Dataset, Pre-processing, Referencemodel, and Hyper-parameters

We use Reddit 2016 comment datasets that arepublicly available (/u/Dewarim, 2017). We builda corpus from each subreddit by extracting all thecomments posted on that subreddit. When the sizeof two corpora used for comparison is consider-ably different, we undersample the bigger corpusfor a fair comparison. Every corpus then under-

goes the same pre-processing, where we first re-move punctuation and stop words, then replaceURLs with its domain name.

Reference model for Reddit data As wediscussed earlier, many datasets of our in-terest are likely too small to obtain goodvector representations. For example, twopopular subreddits, /r/The Donald and/r/SandersForPresident2, show only59.8% and 42.1% in analogy test, respectively.3

Therefore, as we proposed, we first create a pre-trained word embedding with a larger backgroundcorpus and perform additional training with targetsubreddits. We sample 1 million commentsfrom each of the top 200 subreddits, resultingin 20 million comments. Using this sample, webuild a word embedding, denoted as Reddit20M,using the CBOW model with a window size offive, a minimum word count of 10, the negativesampling, and down-sampling of frequent termsas suggested in (Levy et al., 2015). For thesubsequent training with the target corpora, wetrain the model with a small starting learning rate,0.005; Using different rates, such as 0.01 and0.001, did not make much difference. We furthertune the model with the dimension size of 300 andthe number of the epoch of 100 using the analogytest results.

Category Reddit20M Google300D

World 28.34 70.2family 94.58 90.06Gram1-9 70.21 73.40

Total 67.88 77.08

Table 3: Results of analogy tests, comparing 20Msample texts from Reddit vs. Google 100B News.

Table 3 shows the results of the analogytest using our Reddit20M in comparison withGoogle300D, which is the Google News em-bedding used in previous sections. As onecan expect, Reddit20M shows worse perfor-mance than Google300D. However, the four cat-egories (capital-common-countries, capital-world,currency, and city-in-state denoted by World),which require some general knowledge on theworld, drive the 10% decrease in overall accu-

2/r/ is a common notation for indicating subreddits.3For both corpora, continuous bag-of-words (CBOW)

model with the dimension size of 300 achieves the highestaccuracy in the analogy test.

2456

racy. Other categories show comparable or evenbetter accuracy. For example, Reddit20M outper-forms Google300D by 4.52% in the family cate-gory. Since Reddit is a US-based online commu-nity, the Reddit model may not be able to properlycapture semantic relationships in World category.By contrast, for the categories for testing grammar(denoted by Gram1-9), Reddit20M shows compa-rable performances with Google300D (70.21 vs.73.4). In this study, we use Reddit20M as a refer-ence model and update it with new target corpora.

Figure 1: Changes of word semantics (box plot)and accuracy (line graph) over epoch for the modelfor /r/SandersForPresident

Updating the reference model As we explainedin Section 3.2.2, we stop updating the referencemodel depending on the accuracy of the analogytest and semantic changes of the top 1000 words ofthe given corpus. In our experiments, we set α =0.3 and β = 0.001. Figure 1 shows the accuracyof the analogy test over epoch as a line plot andthe semantic changes of words as a box plot forthe model for /r/SandersForPresident.The model gradually loses general semantic rela-tion over epochs, and the characteristic semanticchanges stabilize after about 10 epochs. Given theα and β, we use the embedding after 10 epochsof training with the target subreddit data. Wenote that the results are consistent when epoch isgreater than 10. We choose the number of epochfor other corpora based on the same tactic.

5.2 Confirming well-known languagerepresentations

Once we have word embeddings for given sub-reddits by updating the pre-trained model, we cancompare the languages of two subreddits. Asa case study, we compare supporters of DonaldTrump (/r/The Donald) and Bernie Sanders

(/r/SandersForPresident)4, and examinethe semantic differences in diverse issues, such asgun and minority, based on different axes. Thiscan be easily compared with our educated guesslearned from the 2016 U.S. Election.

Starting from a topic word (e.g., ‘gun’) and itsclosest word (e.g., ‘guns’), we compute the aver-age vector of the two words. We then find the clos-est word from the computed average vector andrepeat this process to collect 30 topical terms ineach word embedding. Then, we remove wordsthat have appeared less than n times in both cor-pora. The higher n leads to less coverage of top-ical terms but eliminate noise. We set n = 100in the following experiments. We consider the re-maining words as topic words.

Figure 2 compares how the minority-relatedterms are depicted in the two subreddits. Fig-ure 2(a) and 2(b) show how minority issues areperceived in two communities with respect to‘Sentiment’ and ‘Respect’ . The x-axis is the valuefor each word on the Sentiment axis for Trumpsupporters, and the y-axis is the difference be-tween the value for Trump and Sanders supporters.If the y-value is greater than 0, then it means theword is more ‘positive’ among Trump supporterscompared to that among Sanders supporters.

Some terms perceived more positively (e.g.,‘immigration’ and ‘minorities’) while other termswere perceived more negatively (‘black’, ‘latino’,‘hispanic’) among Trump supporters (Figure 2(a)).As this positive perception on immigration andminorities is unexpected, we examine the actualcomments. Through the manual inspection of rel-evant comments, we find that Trump supportersoften mention that they ‘agree’ with or ‘support’the idea of banning immigration, resulting in hav-ing a term ‘immigration’ as more positive thanSanders supporters. However, when examiningthose words on the ‘Disrespect’ vs. ‘Respect’ axis(Figure 2(b)), most of the minority groups are con-sidered disrespectful by Trump supporters com-pared to Sanders supporters, demonstrating a ben-efit of examining multiple semantic axes that canreflect rich semantics beyond basic sentiments.

Then, one would expect that ‘Gun’ would bemore positively perceived for Trump supporterscompared to for Sanders supporters. Beyond thesentiment, we examine how ‘gun’ is perceived in

4Both subreddits have a policy of banning users who postcontent critical of the candidate. Thus, we assume most ofthe users in these subreddits are supporters of the candidate.

2457

(a) Minority (Hate vs Amazing) (b) Minority (Disrespect vs Respect)

Figure 2: Trump supporters vs Sanders supporters on Minority issue

two communities for ‘Arousal’ and ‘Safety’ axes.We find that Trump supporters are generally posi-tive about gun-related issues, and Sanders support-ers associate ‘gun’ with more arousal and danger.

We also examine two other subreddits:/r/atheism and /r/Christianity. Astheir names indicate, the former is the subredditfor atheists and the latter is the subreddit forChristians. We expect that the two groups wouldhave different perspectives regarding ‘god’ and‘evolution.’ When examining the four words‘god,’ ‘pray,’ ‘evolution,’ and ‘science’ on the‘Unholy’ vs. ‘Holy’ axis, ‘god’ and ‘pray’ appearto be more ‘holy’ in /r/Christianity while‘evolution’ and ‘science’ appear more ‘unholy’than in /r/Atheism, which fits in our intuition.

As another example, we examine the /r/PS4and /r/NintendoSwitch subreddits. PS4 isa video game console released by Sony and Nin-tendo Switch is released by Nintendo. Althoughboth video game consoles originated from Japan,Nintendo Switch targets more family (children)and casual gamers with more playful and eas-ier games while the games for PS4 target adultand thus tend to be more violent and more diffi-cult to play. We examine three terms (‘Nintendo,’‘Mario,’ and ‘Zelda’) from Nintendo Switch andthree terms (‘Sony,’ ‘Uncharted,’ and ‘Killzone’)from Sony on the ‘Casual’ vs. ‘Hardcore’ axis.5

We find that ‘Mario’ and ‘Zelda’ are perceivedmore casual in /r/PS4, and ‘Uncharted’ and‘Killzone’ are more hardcore in /r/PS4 than/r/NintendoSwitch. Although both ‘Nin-tendo’ and ‘Sony’ have negative values, ‘Nin-

5Mario and Zelda are popular Nintendo Switch games,and Uncharted and Killzone are popular PS4 games.

tendo’ was considered more casual than ‘Sony’in /r/PS4. Overall, our method effectively cap-tures context-dependent semantic changes beyondthe basic sentiments.

5.3 Comparative Text Analysis with DiverseAxes

Let us show how SEMAXIS can find, for a givenword, a set of the best axes that describe its seman-tic. We map the word on our predefined 732 axes,which are explained in Section 3.3.1, and rank theaxes based on the projection values on the axes. Inother words, the top axes describe the word withthe highest strength.

Figure 3(a) shows the top 20 axes withthe largest projection values for ‘Men’ in/r/AskWomen and /r/AskMen, which arethe subreddits where people expect replies fromeach gender. In /r/AskWomen, compared with/r/AskMen, ‘Men’ seems to be perceived asmore vanishing, more established, less systematic,less monogamous, more enthusiastic, less social,more uncluttered, less vulnerable, and more un-apologetic. This observed perception of men fromwomen’s perspective seems to concur with thecommon gender stereotype, demonstrating strongpotential of SEMAXIS.

Likewise, in Figure 3(b), we examine howa word ‘Mario’ is perceived in two subred-dits /r/NintendoSwitch and /r/PS4. In/r/NintendoSwitch, ‘Mario’ is perceived,compared with /r/PS4, as more luxurious, fa-mous, unobjectionable, open, capable, likable,successful, loving, honorable, and controllable.On the other hand, users in /r/PS4 consider‘Mario’ to be more virtual, creative, durable,

2458

(a) Men in /r/AskWomen and /r/AskMen (b) Mario in /r/NintendoSwitch and /r/PS4

Figure 3: An example of comparative text analysis using SEMAXIS: (a) ‘Men’ in /r/AskWomen and/r/AskMen and (b) ‘Mario’ in /r/NintendoSwitch and /r/PS4

satisfying, popular, undetectable, and unstop-pable. ‘Mario’ is perceived more positively in/r/NintendoSwitch than in /r/PS4, as ex-pected. Furthermore, SEMAXIS reveals detailedand nuanced perceptions of different communities.

6 Discussion and Conclusion

We have proposed SEMAXIS to examine a nu-anced representation of words based on diversesemantic axes. We have shown that SEMAXIS

can construct good domain-specific sentiment lex-icons by projecting words on the sentiment axis.We have also demonstrated that our approachcan reveal nuanced context-dependence of wordsthrough the lens of numerous semantic axes.

There are two major limitations. First, we per-formed the quantitative evaluation only with thesentiment axis, even though we supplemented itwith more qualitative examples. We used the sen-timent axis because it is better studied and moremethods exist, but ideally it would be better toperform evaluation across many semantic axes.We hope that SEMAXIS can facilitate research onother semantic axes so that we will have labeleddatasets for other axes as well. Secondly, Gaffneyand Matias (2018) recently reported the Redditdata used in this study is incomplete. The authorssuggest using the data with caution, particularlywhen analyzing user interactions. Although ourwork examine communities in Reddit, we focuson the difference of the word semantics. Thus,

we believe the effect of deleted comment wouldbe marginal in our analyses.

Despite these limitations, we identify the fol-lowing key implications. First, SEMAXIS of-fers a framework to examine texts on diverse se-mantic axes beyond the sentiment axis, throughthe 732 systematically induced semantic axes thatcapture common antonyms. Our study may facil-itate further investigations on context-dependenttext analysis techniques and applications. Sec-ond, the unsupervised nature of SEMAXIS pro-vides a powerful way to build lexicons of anysemantic axis, including the sentiment axis, fornon-English languages, particularly the resource-scarce languages.

7 Acknowledgements

The authors thank Jaehyuk Park for his helpfulcomments. This research has been supported inpart by Volkswagen Foundation and in part bythe Defense Advanced Research Projects Agency(DARPA), W911NF-17-C-0094. The U.S. Gov-ernment is authorized to reproduce and distributereprints for Governmental purposes notwithstand-ing any copyright annotation thereon. The viewsand conclusions contained herein are those of theauthors and should not be interpreted as neces-sarily representing the official policies or endorse-ments, either expressed or implied, of DARPA orthe U.S. Government.

2459

ReferencesDouglas Biber and Edward Finegan. 1988. Adver-

bial stance types in english. Discourse processes,11(1):1–34.

Margaret M Bradley and Peter J Lang. 1999. Affectivenorms for english words (ANEW): Instruction man-ual and affective ratings. Technical report, Technicalreport C-1, the center for research in psychophysiol-ogy, University of Florida.

Peter Sheridan Dodds, Kameron Decker Harris, Is-abel M Kloumann, Catherine A Bliss, and Christo-pher M Danforth. 2011. Temporal patterns ofhappiness and information in a global social net-work: Hedonometrics and Twitter. PLOS ONE,6(12):e26752.

Penelope Eckert and Sally McConnell-Ginet. 2013.Language and gender. Cambridge University Press.

A. Esuli and F. Sebastiani. 2006. Sentiwordnet: Apublicly available lexical resource for opinion min-ing. In Proceedings of the Fifth InternationalConference on Language Resources and Evaluation(LREC’06). European Language Resources Associ-ation (ELRA).

Ethan Fast, Binbin Chen, and Michael S Bernstein.2016. Empath: Understanding topic signals in large-scale text. In Proceedings of the 2016 CHI Con-ference on Human Factors in Computing Systems,pages 4647–4657. ACM.

John L. Fischer. 1958. Social influences on the choiceof a linguistic variant. WORD, 14(1):47–56.

Devin Gaffney and J Nathan Matias. 2018. Caveatemptor, computational social science: Large-scalemissing data in a widely-published reddit corpus.arXiv preprint arXiv:1803.05046.

Google. 2013. Google word2vec. https://code.google.com/archive/p/word2vec/. [On-line; accessed February 23, 2018].

Lisa J Green. 2002. African American English: a lin-guistic introduction. Cambridge University Press.

William L. Hamilton, Kevin Clark, Jure Leskovec, andDan Jurafsky. 2016a. Inducing domain-specific sen-timent lexicons from unlabeled corpora. In Proceed-ings of the 2016 Conference on Empirical Methodsin Natural Language Processing, pages 595–605.Association for Computational Linguistics.

William L. Hamilton, Jure Leskovec, and Dan Jurafsky.2016b. Diachronic word embeddings reveal statisti-cal laws of semantic change. In Proceedings of the54th Annual Meeting of the Association for Compu-tational Linguistics (Volume 1: Long Papers), pages1489–1501. Association for Computational Linguis-tics.

Vasileios Hatzivassiloglou and Kathleen R. McKeown.1997. Predicting the semantic orientation of adjec-tives. In 8th Conference of the European Chapter ofthe Association for Computational Linguistics.

Yoonsung Hong, Haewoon Kwak, Youngmin Baek,and Sue Moon. 2013. Tower of babel: Acrowdsourcing game building sentiment lexicons forresource-scarce languages. In Proceedings of the22nd International Conference on World Wide Web,WWW ’13 Companion, pages 549–556, New York,NY, USA. ACM.

Dirk Hovy. 2015. Demographic factors improve clas-sification performance. In Proceedings of the 53rdAnnual Meeting of the Association for Computa-tional Linguistics and the 7th International JointConference on Natural Language Processing (Vol-ume 1: Long Papers), pages 752–762. Associationfor Computational Linguistics.

Yoon Kim. 2014. Convolutional neural networks forsentence classification. In Proceedings of the 2014Conference on Empirical Methods in Natural Lan-guage Processing (EMNLP), pages 1746–1751. As-sociation for Computational Linguistics.

Omer Levy, Yoav Goldberg, and Ido Dagan. 2015. Im-proving distributional similarity with lessons learnedfrom word embeddings. Transactions of the Associ-ation for Computational Linguistics, 3:211–225.

Tim Loughran and Bill McDonald. 2011. When is aliability not a liability? textual analysis, dictionaries,and 10-ks. The Journal of Finance, 66(1):35–65.

Tomas Mikolov, Kai Chen, Greg Corrado, and Jef-frey Dean. 2013a. Efficient estimation of wordrepresentations in vector space. arXiv preprintarXiv:1301.3781.

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Cor-rado, and Jeff Dean. 2013b. Distributed representa-tions of words and phrases and their compositional-ity. In Advances in Neural Information ProcessingSystems, pages 3111–3119.

Saif Mohammad, Svetlana Kiritchenko, Parinaz Sob-hani, Xiaodan Zhu, and Colin Cherry. 2016.Semeval-2016 task 6: Detecting stance in tweets. InProceedings of the 10th International Workshop onSemantic Evaluation (SemEval-2016), pages 31–41.Association for Computational Linguistics.

Saif Mohammad and Peter Turney. 2010. Emotionsevoked by common words and phrases: Using me-chanical turk to create an emotion lexicon. In Pro-ceedings of the NAACL HLT 2010 Workshop onComputational Approaches to Analysis and Gener-ation of Emotion in Text, pages 26–34. Associationfor Computational Linguistics.

Saif M. Mohammad, Parinaz Sobhani, and SvetlanaKiritchenko. 2017. Stance and sentiment in tweets.ACM Transactions on Internet Technology (TOIT),17(3):26:1–26:23.

https://doi.org/10.1371/journal.pone.0026752



http://www.aclweb.org/anthology/L06-1225



https://doi.org/10.1145/2858036.2858535

https://doi.org/10.1145/2858036.2858535

https://code.google.com/archive/p/word2vec/

https://code.google.com/archive/p/word2vec/

https://doi.org/10.18653/v1/D16-1057

https://doi.org/10.18653/v1/D16-1057

https://doi.org/10.18653/v1/P16-1141

https://doi.org/10.18653/v1/P16-1141

http://www.aclweb.org/anthology/E97-1023


https://doi.org/10.1145/2487788.2487993

https://doi.org/10.1145/2487788.2487993

https://doi.org/10.1145/2487788.2487993

https://doi.org/10.3115/v1/P15-1073

https://doi.org/10.3115/v1/P15-1073

https://doi.org/10.3115/v1/D14-1181

https://doi.org/10.3115/v1/D14-1181

http://aclweb.org/anthology/Q15-1016



https://doi.org/10.1111/j.1540-6261.2010.01625.x

https://doi.org/10.1111/j.1540-6261.2010.01625.x

https://doi.org/10.1111/j.1540-6261.2010.01625.x

https://doi.org/10.18653/v1/S16-1003

http://www.aclweb.org/anthology/W10-0204



https://doi.org/10.1145/3003433

2460

Bo Pang and Lillian Lee. 2004. A sentimental edu-cation: Sentiment analysis using subjectivity sum-marization based on minimum cuts. In Proceedingsof the 42nd Annual Meeting of the Association forComputational Linguistics (ACL-04).

Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan.2002. Thumbs up? sentiment classification usingmachine learning techniques. In Proceedings of the2002 Conference on Empirical Methods in NaturalLanguage Processing (EMNLP 2002).

James W Pennebaker, Martha E Francis, and Roger JBooth. 2001. Linguistic inquiry and word count:LIWC 2001. Mahway: Lawrence Erlbaum Asso-ciates, 71.

Delip Rao and Deepak Ravichandran. 2009. Semi-supervised polarity lexicon induction. In Proceed-ings of the 12th Conference of the European Chapterof the ACL (EACL 2009), pages 675–682. Associa-tion for Computational Linguistics.

Reddit. 2005. Reddit: the front page of the inter-net. https://www.reddit.com/. [Online;accessed February 23, 2018].

Sara Rosenthal and Kathleen McKeown. 2011. Ageprediction in blogs: A study of style, content, andonline behavior in pre- and post-social media gen-erations. In Proceedings of the 49th Annual Meet-ing of the Association for Computational Linguis-tics: Human Language Technologies, pages 763–772. Association for Computational Linguistics.

Sara Rosenthal, Preslav Nakov, Svetlana Kiritchenko,Saif Mohammad, Alan Ritter, and Veselin Stoyanov.2015. Semeval-2015 task 10: Sentiment analysisin twitter. In Proceedings of the 9th InternationalWorkshop on Semantic Evaluation (SemEval 2015),pages 451–463. Association for Computational Lin-guistics.

Sascha Rothe, Sebastian Ebert, and Hinrich Schutze.2016. Ultradense word embeddings by orthogonaltransformation. In Proceedings of the 2016 Confer-ence of the North American Chapter of the Associ-ation for Computational Linguistics: Human Lan-guage Technologies, pages 767–777. Association forComputational Linguistics.

Inaki San Vicente, Rodrigo Agerri, and German Rigau.2014. Simple, robust and (almost) unsupervisedgeneration of polarity lexicons for multiple lan-guages. In Proceedings of the 14th Conference ofthe European Chapter of the Association for Com-putational Linguistics, pages 88–97. Association forComputational Linguistics.

Richard Socher, Alex Perelygin, Jean Wu, JasonChuang, Christopher D. Manning, Andrew Ng, andChristopher Potts. 2013. Recursive deep modelsfor semantic compositionality over a sentiment tree-bank. In Proceedings of the 2013 Conference on

Empirical Methods in Natural Language Process-ing, pages 1631–1642. Association for Computa-tional Linguistics.

SocialSent. 2016. Code and data for inducing domain-specific sentiment lexicons. https://github.com/williamleif/socialsent. [Online;accessed February 23, 2018].

Robert Speer, Joshua Chin, and Catherine Havasi.2017. Conceptnet 5.5: An open multilingual graphof general knowledge. In AAAI, pages 4444–4451.

Philip J Stone, Dexter C Dunphy, and Marshall SSmith. 1966. The general inquirer: A computer ap-proach to content analysis. MIT press.

Maite Taboada, Julian Brooke, Milan Tofiloski, Kim-berly Voll, and Manfred Stede. 2011. Lexicon-basedmethods for sentiment analysis. Computational Lin-guistics, 37(2).

Duyu Tang, Furu Wei, Bing Qin, Ming Zhou, and TingLiu. 2014. Building large-scale twitter-specific sen-timent lexicon : A representation learning approach.In Proceedings of COLING 2014, the 25th Inter-national Conference on Computational Linguistics:Technical Papers, pages 172–182. Dublin City Uni-versity and Association for Computational Linguis-tics.

Peter Trudgill. 1974. Linguistic change and diffu-sion: Description and explanation in sociolinguisticdialect geography. Language in society, 3(2):215–246.

Peter Turney. 2002. Thumbs up or thumbs down? se-mantic orientation applied to unsupervised classifi-cation of reviews. In Proceedings of the 40th An-nual Meeting of the Association for ComputationalLinguistics.

/u/Dewarim. 2017. Reddit comment dataset. https://files.pushshift.io/. [Online; down-loaded July 23, 2017].

Leonid Velikovich, Sasha Blair-Goldensohn, KerryHannan, and Ryan McDonald. 2010. The viabilityof web-derived polarity lexicons. In Human Lan-guage Technologies: The 2010 Annual Conferenceof the North American Chapter of the Associationfor Computational Linguistics, pages 777–785. As-sociation for Computational Linguistics.

Amy Beth Warriner, Victor Kuperman, and Marc Brys-baert. 2013. Norms of valence, arousal, and dom-inance for 13,915 english lemmas. Behavior re-search methods, 45(4):1191–1207.

Dominic Widdows and Beate Dorow. 2002. A graphmodel for unsupervised lexical acquisition. In COL-ING 2002: The 19th International Conference onComputational Linguistics.

Janyce M. Wiebe. 1994. Tracking point of view in nar-rative. Computational Linguistics, 20(2).

http://www.aclweb.org/anthology/P04-1035







https://www.reddit.com/





https://doi.org/10.18653/v1/S15-2078

https://doi.org/10.18653/v1/S15-2078

https://doi.org/10.18653/v1/N16-1091

https://doi.org/10.18653/v1/N16-1091

https://doi.org/10.3115/v1/E14-1010

https://doi.org/10.3115/v1/E14-1010

https://doi.org/10.3115/v1/E14-1010

http://www.aclweb.org/anthology/D13-1170



https://github.com/williamleif/socialsent

https://github.com/williamleif/socialsent

https://doi.org/10.1162/COLI_a_00049

https://doi.org/10.1162/COLI_a_00049

http://www.aclweb.org/anthology/C14-1018





https://files.pushshift.io/

https://files.pushshift.io/

http://www.aclweb.org/anthology/N10-1119

http://www.aclweb.org/anthology/N10-1119



http://www.aclweb.org/anthology/J94-2004

http://www.aclweb.org/anthology/J94-2004

2461

Yi Yang and Jacob Eisenstein. 2015. Puttingthings in context: Community-specific embed-ding projections for sentiment analysis. CoRR,abs/1511.06052.

Yi Yang and Jacob Eisenstein. 2017. Overcoming lan-guage variation in sentiment analysis with social at-tention. Transactions of the Association for Compu-tational Linguistics, 5:295–307.

Justine Zhang, William L Hamilton, Cristian Danescu-Niculescu-Mizil, Dan Jurafsky, and Jure Leskovec.2017. Community identity and user engagement in amulti-community landscape. In Proceedings of TheInternational Conference on Web and Social Media.

Xiaojin Zhu, Zoubin Ghahramani, and John D Laf-ferty. 2003. Semi-supervised learning using gaus-sian fields and harmonic functions. In Proceed-ings of the 20th International conference on Ma-chine learning (ICML-03), pages 912–919.




SemAxis: A Lightweight Framework to Characterize Domain ... · A higher score means that the word...

Documents

Transcript of SemAxis: A Lightweight Framework to Characterize Domain ... · A higher score means that the word...