From Paraphrase Database to Compositional Paraphrase Model and Back John Wieting University of...

From Paraphrase Database to Compositional Paraphrase Model and Back

John WietingUniversity of Illinois

Joint work with Mohit Bansal, Kevin Gimpel, Karen Livescu, and Dan Roth

The PPDB (Ganitkevitch et. al, 2013) is a vast collection of paraphrase pairs

Motivation

that allow the which enable thebe given the opportunity to have the possibility of

i can hardly hear you . you 're breaking up .and the establishment as well as the developmentlaying the foundations pave the way

making every effort to do its utmost… …

Motivation

• Improve coverage

• Have a parametric model

• Improve phrase pair scores

Contributions

• Powerful word embeddings that have human-level performance on SimLex999 and WordSim353

• Phrase embeddings• Model can re-rank phrases in PPDB 1.0 (Improve human

correlation from 25 to 52 ρ.)• Parameterization of PPDB that can be used downstream

• New datasets

Datasets

Wanted clean way to evaluate paraphrase composition

Two new datasets: One for bigram paraphrases and one for short-phrase paraphrases from PPDB

WordSim353

Topical Paraphrastic

SimLex-999Words

Bigrams MLSim(Mitchell and Lapata, 2010)

MLSim BigramPara

television programme tv set 5.8 1.0

training programme education course 5.7 5.0

bedroom window education officer 1.3 1.0

WordSim353

SimLex-999Words

MLPara(this talk)

MLSim MLPara

television programme tv set 5.8 1.0

training programme education course 5.7 5.0

bedroom window education officer 1.3 1.0

WordSim353

SimLex-999Words

MLPara(this talk)

Spearman’s rho Cohen’s kappa

adjective noun 0.87 0.79

noun noun 0.64 0.58

verb noun 0.73 0.73

WordSim353

SimLex-999Words

MLPara(this talk)

Phrases AnnoPPDB(this talk)

AnnoPPDB(this talk)

AnnoPPDB

can not be separated from is inseparable from 5.0

hoped to be able to looked forward to 3.4

come on , think about it people , please 2.2

how do you mean that what worst feelings 1.6

Phrases

AnnoPPDB(this talk)

AnnoPPDB

Phrases

Mean Deviation: 0.60

AnnoPPDB(this talk)

AnnoPPDB

Phrases

Dev and test sets were designed to have:

1) Variety of lengths2) Variety of quality3) Low word overlap

AnnoPPDB(this talk)

AnnoPPDB

Phrases

See Pavlick et al., 2015 for similar but larger dataset

Learning EmbeddingsWe now have datasets to test paraphrase similarity. Next we learn to embed words and phrases

All similarities are computed using cosine distance

Learning Embeddings

Related work on using PPDB to improve word embeddings: Yu and Dredze, 2014; Faruqui et al., 2015

We now have datasets to test paraphrase similarity. Next we learn to embed words and phrases

All similarities are computed using cosine distance

Training examples (word pairs from PPDB):

contamination pollution

converged convergence

captioned subtitled

outwit thwart

bad villain

broad general

permanent permanently

bed sack

carefree reckless

absolutely urgently

… …

Loss Function for Learning

sums over word pairs in PPDB

positive example

negative examples

positive example

Choosing Negative Examples?

only do argmax over current mini-batch

(for efficiency)

Choosing Negative Examples?

only do argmax over current mini-batch

(for efficiency)

we regularize by penalizing squared L2 distance to initial embeddings

113k word pairs from PPDB (XL)Training:

WordSim353Tuning:

SimLex-999Test:Notes: 1. trained with AdaGrad, tuned stepsize, mini-batch size, and regularization 2. initialized with 25-dim skip-gram vectors trained on Wikipedia 3. statistical significance computed using one-tailed method of Steiger (1980) 4. output of training: “paragram” embeddings

contamination pollutionconverged convergencecaptioned subtitled

… …

Results: SimLex-999

Series110

skip-gram (25-dim)

skip-gram (1000-dim)

Hill et al. (2014)

Average Human

Spearman’s ρ × 100

Results: SimLex-999

Series110

skip-gram (25-dim)

skip-gram (1000-dim)

Hill et al. (2014)

paragram (25-dim)

Average Human

WordSim353Tuning:

SimLex-999Test:Notes: 1. replaced dot product in objective with cosine distance 2. trained with AdaGrad, tuned stepsize, mini-batch size, margin and regularization 3. initialized with 300-dim GloVe common crawl embeddings 4. output of training: “paragram-ws353” embeddings (“paragram-sl999” if tuned on SimLex-999)

… …Scaling up to 300 dimensions

WordSim353Tuning:

SimLex-999Test:Notes: 1. replaced dot product in objective with cosine distance 2. trained with AdaGrad, tuned stepsize, mini-batch, margin and regularization 3. initialized with 300-dim GloVe common crawl embeddings 4. output of training: “paragram-ws353” embeddings (“paragram-sl999” if tuned on SimLex-999)

… …

Results: SimLex-999

Series110

56.3 57.8

Schwartz et al. 2015

Faruqui and Dyer 2015

Average Human

Results: SimLex-999

Series110

56.3 57.8

66.7 65.1

Schwartz et al. 2015

Faruqui and Dyer 2015

paragram-ws353

Average HumanPa

Results: SimLex-999

Series110

56.3 57.8

66.7 68.565.1

GloVeSchwartz et al. 2015Faruqui and Dyer 2015 paragram-ws353paragram-sl999Average Human

Results: WordSim-353

Series110

68.171.3

Faruqui et al. 2015

Huang et al. 2012

Average Human

Tune on SimLex-999, test on WordSim-353

Series110

68.171.3 72

Faruqui et al. 2015

Huang et al. 2012

paragram-sl999

Average Human

Series110

68.171.3 72

76.9 75.6

GloVeFaruqui et al. 2015 Huang et al. 2012paragram-sl999paragram-ws353Average Human

Extrinsic Evaluation: Sentiment Analysis

word vectors dimensionality accuracy

skip-gram 25 77.0

skip-gram 50 79.6

paragram 25 80.9

Stanford Sentiment Treebank, binary classification

convolutional neural network (Kim, 2014) with 200 unigram filtersstatic: no fine-tuning of word vectors

25 dimension case

skip-gram 25 77.0

skip-gram 50 79.6

paragram 25 80.9

GloVe 300 81.4

paragram-ws353 300 83.9

paragram-sl999 300 84.0

300 dimension case

GloVe 300 81.4

paragram-ws353 300 83.9

paragram-sl999 300 84.0

We compare standard approaches:

vector addition

recursive neural network (RvNN) (Socher et al., 2011)

recurrent neural networks (RtNN)

Embedding Phrases?

requires binarized parse;we use Stanford parser

Loss Functions for Phrases

replace word vectors by phrase vectors

(computed by RvNN, RtNN, etc.)sum over phrase

pairs in PPDB

we regularize by penalizing squared L2 distance to initial (skip-gram) embeddings and L2 regularization on the composition

parameters

bigram pairs extracted from PPDBTraining:

MLSim (Mitchell & Lapata, 2010)Tuning:

MLParaTest:

adjective noun (134k) noun noun (36k) verb noun (63k)

easy job simple task town meeting town council achieve goal achieve aim

Notes: we extract bigram pairs of each type from PPDB using a part-of-speech tagger when tuning/testing on one subset, we only train on bigram pairs for that subset

Series110

skip-gram (25), +

skip-gram (1000), +

Hashimoto et al. (2014)

Average Human

Results: MLPara

averages over three data splits: adj noun, noun noun, verb noun

Series110

skip-gram (25), +

skip-gram (1000), +

Hashimoto et al. (2014)

paragram (25), +

Average Human

Results: MLPara

averages over three data splits: adj noun, noun noun, verb nounHu

Series110

skip-gram (25), +skip-gram (1000), +Hashimoto et al. (2014)paragram (25), +paragram (25), RNNAverage Human

Results: MLPara

Series110

paragram (25), RNN

Average Human

Results: MLPara

300 dimension case

Series110

paragram (25), RNN

Average Human

Results: MLPara

Series110

51 52 52

paragram-ws353,+

paragram-sl999,+

paragram (25), RNN

Average Human

Results: MLPara

60k phrase pairs from PPDBTraining:

260 annotated phrase pairsTuning:

1000 annotated phrase pairsTest:

that allow the which enable thebe given the opportunity to have the possibility of

i can hardly hear you . you 're breaking up .and the establishment as well as the developmentlaying the foundations pave the way

making every effort to do its utmost… …

Series10

33skip-gram (25)PPDBPPDB (tuned)

Results: AnnoPPDB

support vector regression to predict gold similarities5-fold cross validation on 260-example dev set

Series10

33 32skip-gram (25)

PPDB (tuned)

paragram (25), +

Results: AnnoPPDB

Series10

skip-gram (25)PPDBPPDB (tuned)paragram (25), +paragram (25), RtNNparagram (25), RvNN

Results: AnnoPPDB

Series10

40PPDB

paragram (25), RtNN

Results: AnnoPPDB

300 dimension case

Series10

40PPDB

paragram (25), RtNN

Results: AnnoPPDB

Series10

41 PPDB

paragram (25), RtNN

paragram-ws353

paragram-sl999

Results: AnnoPPDB

Series10

PPDBparagram (25), RtNNparagram-ws353paragram-sl999RtNN (300)LSTM (300)

Results: AnnoPPDB

gold RvNN +

does not exceed is no more than 5.0 4.8 3.5

could have an impact on may influence 4.6 4.2 3.2

earliest opportunity early as possible 4.4 4.3 2.9

gold RcNN +

scheduled to be held in that will take place in 4.6 2.9 4.4

according to the paper , the newspaper reported that 4.6 2.8 4.1

’s surname family name of 4.4 2.8 4.1

RvNN is better:

addition is better:

Qualitative Analysis: For positive examples, addition model outperforms

RvNN when phrases 1) have similar length

2) have more “synonyms” in common

gold RvNN +

does not exceed is no more than 5.0 4.8 3.5

could have an impact on may influence 4.6 4.2 3.2

earliest opportunity early as possible 4.4 4.3 2.9

gold RvNN +

scheduled to be held in that will take place in 4.6 2.9 4.4

according to the paper , the newspaper reported that 4.6 2.8 4.1

’s surname family name of 4.4 2.8 4.1

RvNN is better:

Addition is better:

Conclusion

Our work shows how to use PPDB to:1) Create word embeddings that have human level performance on Simlex-999 and WordSim-353 2) Create compositonal paraphrase models that can improve correlation of PPDB 1.0 from 25 to 52 ρ.

We have also released two new datasets for evaluation of short-phrase paraphrasing models

Ongoing work: Phrase model improvements, off-the-shelf testing on downstream tasks

Thanks!

From Paraphrase Database to Compositional Paraphrase Model and Back John Wieting University of...

Documents

Transcript of From Paraphrase Database to Compositional Paraphrase Model and Back John Wieting University of...

Eu paraphrase

LEARNING AND APPLICATIONS OF PARAPHRASTICjwieting/wieting-proposal.pdf · LEARNING AND APPLICATIONS OF PARAPHRASTIC REPRESENTATIONS FOR NATURAL LANGUAGE john wieting Ph.D. Thesis

METRIC SPACES Thomas Wieting Reed College 2015people.reed.edu/~wieting/essays/MetricSpaces.pdf · Chapter 1 METRIC SPACES 01 The theory of metric spaces provides a general context

Chris Dyer - Kevin Gimpel Waleed Ammar - Noah Smith

CCP Dec. Paraphrase

Synthesize and Paraphrase

Bass - Strauss Paraphrase

Paraphrase summarize

Plagiarism, paraphrase, summary

Paraphrase - Weeblyvbnwebsite.weebly.com/uploads/1/1/5/8/11582038/paraphrase.pdf · Paraphrase Five steps for a good paraphrase: 3. Write the paraphrase from your note without looking

Sleeping Beauty Paraphrase

Chapter 11 Chemical Reactions. Note Taking Tips! Paraphrase, paraphrase, paraphrase! Use symbols, arrows, pictures, and abbreviations whenever possible.

Paraphrase. Cleanth Brooks, “The Heresy of Paraphrase” (1947) the resistance which any good poem sets up against all attempts to paraphrase it.

Summary Paraphrase Quote

“Gimpel the Fool” by Isaac Bashevis Singer (Israel)

Against the Paraphrase

Nonnos Paraphrase Evangile

94351351 Paraphrase Summary

Liszt-Verdi - Aida Paraphrase

Bashevis Singer, Isaac-gimpel Luda