Polarity Inducing Latent Semantic Analysis

29
Polarity Inducing Latent Semantic Analysis Scott Wen-tau Yih Joint work with Geoffrey Zweig & John Platt Microsoft Research A vector space model that can distinguish Antonyms from Synonyms!

description

Polarity Inducing Latent Semantic Analysis. A vector space model that can distinguish Antonyms from Synonyms!. Scott Wen-tau Yih Joint work with Geoffrey Zweig & John Platt Microsoft Research. v q. Vector Space Model. v d. - PowerPoint PPT Presentation

Transcript of Polarity Inducing Latent Semantic Analysis

Page 1: Polarity Inducing Latent Semantic Analysis

Polarity Inducing Latent Semantic Analysis

Scott Wen-tau YihJoint work with Geoffrey Zweig & John PlattMicrosoft Research

A vector space model that can distinguish Antonyms from Synonyms!

Page 2: Polarity Inducing Latent Semantic Analysis

Vector Space ModelText objects (e.g., words, phrases, sentences or documents) are represented as vectors

High-dimensional sparse term-vectorsConcept vectors from topic models or projection methodsConstructed compositionally from word vectors [Socher et al. 12]

Relations of the text objects are estimated by functions in the vector space

Relatedness is measured by some distance function (e.g., cosine)

qvcos()vq

vd

Page 3: Polarity Inducing Latent Semantic Analysis

Applications of Vector Space ModelsDocument Level

Information Retrieval [Salton & McGill 83]Document Clustering [Deerwester et al. 90]Search Relevance Measurement [Baeza-Yates & Riberio-Neto ’99]Cross-lingual document retrieval [Platt et al. 10; Yih et al. 11]

Word LevelLanguage modeling [Bellegarda 00]Word similarity and relatedness [Deerwester et al. 90; Lin 98; Turney 01; Turney & Littman 05; Agirre et al. 09; Reisinger & Mooney 10; Yih & Qazvinian 12]

Page 4: Polarity Inducing Latent Semantic Analysis

Beyond General SimilarityExisting VSMs cannot distinguish finer relations

The “antonym” issue of distributional similarity

The co-occurrence or distributional hypothesesApply to near-synonyms, hypernyms and other semantically related words, including antonyms [Mohammad et al. 08]e.g., “hot” and “cold” occur in similar contexts

LSA does not solve the issueMight assign a high degree of similarity to opposites as well as synonyms [Landauer & Laham 98]

Page 5: Polarity Inducing Latent Semantic Analysis

Approaches for Detecting AntonymsSeparate antonyms from distributionally

similar word pairs [Lin et al. 03] Patterns: “from X to Y”, “either X or Y”

WordNet graph [Harabagiu et al. 06]Synsets connected by is-a links and exactly one antonymy link

WordNet + affix rules + heuristics [Mohammad et al. 08]Distinguishing synonyms and antonyms is still perceived as a difficult open problem…[Poon & Domingos 09]

Page 6: Polarity Inducing Latent Semantic Analysis

Our ContributionsPolarity Inducing Latent Semantic Analysis (PILSA)

A vector space model that encodes polarity informationSynonyms cluster together in this spaceAntonyms lie at the opposite ends of a unit spherehot

burning

coldfreezing

Page 7: Polarity Inducing Latent Semantic Analysis

Our ContributionsPolarity Inducing Latent Semantic Analysis (PILSA)

A vector space model that encodes polarity informationSynonyms cluster together in this spaceAntonyms lie at the opposite ends of a unit sphere

Significantly improved the prediction accuracy on a benchmark GRE dataset ()

Page 8: Polarity Inducing Latent Semantic Analysis

RoadmapIntroductionPolarity Inducing Latent Semantic Analysis

Basic constructionExtension 1: Improving accuracyExtension 2: Improving coverage

Experimental evaluationTask & datasetsResults

Conclusion

Page 9: Polarity Inducing Latent Semantic Analysis

Input: A thesaurus (with synonyms & antonyms)

Create a “document”-term matrixEach group of words (synonyms and antonyms) is treated as a “document”

Induce polarity by making antonyms have negative weightsApply SVD as in regular Latent Semantic Analysis

The Core Method

Page 10: Polarity Inducing Latent Semantic Analysis

Matrix ConstructionAcrimony: rancor, conflict, bitterness; goodwill, affectionAffection: goodwill, tenderness, fondness; acrimony, rancor

acrimony rancor goodwill affection …Group 1: “acrimony”

4.73 6.01 5.81 4.86 …

Group 2: “affection”

3.78 5.23 6.21 5.15 …

… … … … … …

Document: row-vector

Term: column-vector

TFIDF score

Page 11: Polarity Inducing Latent Semantic Analysis

Matrix ConstructionAcrimony: rancor, conflict, bitterness; goodwill, affectionAffection: goodwill, tenderness, fondness; acrimony, rancor

acrimony rancor goodwill affection …Group 1: “acrimony”

4.73 6.01 -5.81 -4.86 …

Group 2: “affection”

-3.78 -5.23 6.21 5.15 …

… … … … … …

Inducing polarity

Cosine Score:

Page 12: Polarity Inducing Latent Semantic Analysis

Effect of Inducing Polarityacrimony rancor goodwill affection

Group 1: “acrimony”

4.73 6.01 5.81 4.86

Group 2: “affection”

3.78 5.23 6.21 5.15

Page 13: Polarity Inducing Latent Semantic Analysis

Effect of Inducing Polarityacrimony rancor goodwill affection

Group 1: “acrimony”

1 1 1 1

Group 2: “affection”

1 1 1 1

Cosine similarity = 1

Page 14: Polarity Inducing Latent Semantic Analysis

Effect of Inducing Polarityacrimony rancor goodwill affection

Group 1: “acrimony”

1 1 1 1

Group 2: “affection”

1 1 1 1

Cosine similarity = 1

Cannot distinguish antonyms from synonyms!

Page 15: Polarity Inducing Latent Semantic Analysis

Effect of Inducing Polarityacrimony rancor goodwill affection

Group 1: “acrimony”

1 1 1 1

Group 2: “affection”

1 1 1 1

acrimony rancor goodwill affectionGroup 1: “acrimony”

1 1 -1 -1

Group 2: “affection”

-1 -1 1 1

Cosine similarity = 1

Page 16: Polarity Inducing Latent Semantic Analysis

Effect of Inducing Polarityacrimony rancor goodwill affection

Group 1: “acrimony”

1 1 1 1

Group 2: “affection”

1 1 1 1

acrimony rancor goodwill affectionGroup 1: “acrimony”

1 1 -1 -1

Group 2: “affection”

-1 -1 1 1

Cosine similarity = -1

Page 17: Polarity Inducing Latent Semantic Analysis

Mapping to Latent Space via SVD

𝐖 𝐔 𝐕T≈

𝑑×𝑛 𝑑×𝑘𝑘×𝑘 𝑘×𝑛

𝐒words

Word similarity: cosine of two columns in SVD generalizes and smooths the original data

Uncovers relationships not explicit in the thesaurus

Page 18: Polarity Inducing Latent Semantic Analysis

Mapping to Latent Space via SVD

𝐖 𝐔 𝐕T≈

𝑑×𝑛 𝑑×𝑘𝑘×𝑘 𝑘×𝑛

𝐒words

As , can be viewed as the projection matrix that maps the raw column-vector to the -dimensional latent space

Page 19: Polarity Inducing Latent Semantic Analysis

Extension 1: Improve AccuracyRefine the projection matrix by discriminative training

S2Net [Yih et al. 11]: very similar to RankNet [Burges et al. 05] but focuses on learning concept vectors

𝒗𝒑𝒗𝒒𝑐𝑘𝑐1

𝑡1 𝑡𝑑𝐴𝑑×𝑘 𝑣𝑝=𝐴𝑇 𝑓 𝑝

𝑓 𝑠𝑖𝑚(𝑣𝑝 ,𝑣𝑞)

𝒇 𝒑

Page 20: Polarity Inducing Latent Semantic Analysis

Applying S2NetTraining data: Antonym pairs from thesaurusInitialize model with the PILSA projection matrix

Learning objective: cosine score of antonyms should be lower than other word pairs

𝐿 ( Δ𝑖𝑗 ;𝐀 )= log (1+exp (−𝛾 Δ𝑖𝑗)¿)¿

-2 -1 0 1 205

101520

AntonymsOtherwordpairΔ𝑖𝑗≡ cos (𝐀 T 𝐟 𝑝𝑖 ,𝐀T 𝐟𝑞 𝑗 )− cos (𝐀T 𝐟 𝑝𝑖 ,𝐀 T 𝐟 𝑞𝑖)

Page 21: Polarity Inducing Latent Semantic Analysis

Extension 2: Improve CoverageWhattodowithout-of-thesauruswords?

Some lexical variationsEncarta thesaurus contains “corruptible” and “corruption”, but not “corruptibility”

Morphological analysis and stemming to find alternatives of an out-of-thesaurus target word

Rare or offensive wordse.g., “froward” and “moronic”

Embedding out-of-thesaurus words by leveraging a general corpus

Page 22: Polarity Inducing Latent Semantic Analysis

Embedding Out-of-thesaurus WordsCreate a context vector space model using a collection of documents (e.g., Wikipedia)

Context: words within a window of [-10,10]Embed target word into the PILSA space by -NN

Find nearby in-thesaurus words in the context spaceRemove words with inconsistentpolarityUse the centroid of the corresponding PILSA vectors to represent the target word

Page 23: Polarity Inducing Latent Semantic Analysis

Embedding Out-of-thesaurus WordsCreate a context vector space model using a collection of documents (e.g., Wikipedia)

Context: words within a window of [-10,10]Embed target word into the PILSA space by -NN

Context Vector Space

PILSA Space

sweltering

burning

hot

cold

Page 24: Polarity Inducing Latent Semantic Analysis

RoadmapIntroductionPolarity Inducing Latent Semantic Analysis

Basic constructionExtension 1: Improving accuracyExtension 2: Improving coverage

Experimental evaluationTask & datasetsResults

Conclusion

Page 25: Polarity Inducing Latent Semantic Analysis

Data for Building PILSA ModelsEncarta Thesaurus (for basic PILSA)

47k word categories (i.e., the “documents”)Vocabulary of 50k words125,724 pairs of antonyms

Wikipedia (for embedding out-of-thesaurus words)

Sentences from a Nov-2010 snapshot917M words after preprocessing

Page 26: Polarity Inducing Latent Semantic Analysis

Experimental EvaluationTask: GRE closest-opposite questions

Which is the closest opposite of adulterate?(a) renounce (b) forbid (c) purify (d) criticize (e) correctDev / Test: 162 / 950 questions [Mohammad et al. 08]Dev set is used for tuning the dimensionality of PILSA

Evaluation metricAccuracy: #correct / #total questionsQuestions with unresolved out-of-thesaurus target words are treated answered incorrectly

Page 27: Polarity Inducing Latent Semantic Analysis

Results on Test Set

Lookup

Raw TF

IDFPIL

SA

PILSA

+S2Net

OOV Embe

dding

Moham

mad et

al. 08

0.50.550.6

0.650.7

0.750.8

0.85

0.56 0.57

0.740.77

0.8

0.64

Page 28: Polarity Inducing Latent Semantic Analysis

ExamplesTarget word: admirableNo polarity – LSA

Most Similar: commendable, creditable, despicableLeast Similar: uninviting, dessert, seductive

With polarity – PILSAMost Similar: commendable, creditable, laudableLeast Similar: despicable, shameful, unworthy

Full results on GRE test set are available online

Page 29: Polarity Inducing Latent Semantic Analysis

ConclusionPolarity Inducing LSA

Solves the open problem of antonyms/synonyms by making a vector space that can distinguish oppositesVector space designed so that synonyms/antonyms tend to have positive/negative cosine similarity

Future WorkNew methods or representations for other word relations

e.g., Part-Whole, Is-A, AttributeApplications

e.g., Textual Entailment or Sentence Completion