Retrofitting Contextualized Word Embeddings with Paraphrases

Weijia Shi1*, Muhao Chen1*, Pei Zhou2, Kai-Wei Chang1

1University of California, Los Angeles2University of Southern California

Contextualized Word Embeddings

Representations that considers the difference of lexical semantics under different linguistic contexts

Such representations have become the backbone of many StoANLU systems for

• Sentence classification, textual inference, QA, EDL, NMT, SRL, …

Contextualized Word Embeddings

Aggregating context information in a word vector with a pre-trained deep neural language model.

Key benefits:

• More refined semantic representations of lexemes

• Automatically capturing polysemy

• Apples have been grown for thousands of years in Asia and Europe.

• With that market capacity, Apple is worth over 1% of the world's GDP.

Embedding space

The Paraphrased Context Problem

The pre-trained language models are not aware of the semantic relatedness of contexts

The same word can be represented more differently than opposite words in unrelated contexts

Contexts L2 distance by ELMo

How can I make bigger my arms?

How do I make my arms bigger?6.42

Some people believe earth is flat, why?

Why do people still believe in flat earth?7.59

It is a very small window.

I have a large suitcase.5.44

The Paraphrased Context Problem

Consider ELMo distances of the same words (excluding stop words) in paraphrased sentence pairs from MRPC:

28.30%

41.50%

>d(good, bad) >d(big, small)

ELMo encoding of shared words in MRPC paraphrases

>d(good, bad) >d(big, small)

Contextualization:

• Can be oversensitive to

paraphrasing,

• and further impair sentence

representations.

Outline

• Background

• Paraphrase-aware retrofitting

• Evaluation

• Future Work

Paraphrase-aware Retrofitting (PAR)

Method

•An orthogonal transformation M to retrofit the input space

•Minimizing the variance of word representations on paraphrased contexts

•Without compromising the varying representations on unrelated contexts

Orthogonal constraint:

Keeping the relative distance of raw embeddings before contextualization

Paraphrase-aware Retrofitting (PAR)

Learning objective

Loss Function:

Input:

Paraphrase 1: What is prison life like?

Paraphrase 2: How is life in prison?

Negative sample: I have life insurance.

𝐿 =

(𝑆1,𝑆2)∈𝑃

𝑤∈𝑆1∩𝑆2

𝑑𝑠1,𝑠2 𝐌𝐰 + 𝛾 − 𝑑𝑆1,𝑆2 𝐌𝐰 ++ 𝜆𝐿𝑂

Intuition: the shared words in paraphrases should be embedded closer than those in non-paraphrases.

Orthogonal constraint

Experiment Settings

Paraphrase pair datasets

• The positive training cases of MRPC (2,753 pairs)

• Sampled Quora (20,000 pairs) and PAN (5,000 pairs)

• Sentence classification: MPQA, MR, CR, SST-2

• Textual inference: MRPC, SICK-E

• Sentence relatedness scoring: SICK-R, STS-15, STS-16, STS-Benchmark

• Adversarial SQuAD

* The first three categories of tasks follow the settings in SentEval [Conneau et al, 2018].

Text Classification/Inference/Relatedness Tasks

PAR leads to performance improvement of ELMo by

• 2.59-4.21% in accuracy on sentence classification tasks

• 2.60-3.30% in accuracy on textual inference tasks

• 3-5% in Pearson correlation in text similarity tasks

PAR improves ELMo on

sentence representation tasks.

SST-2 (acc) SST-Benchmark (ρ) SICK-E (acc)

Comparison of ELMo w/o and W/ on Three SentEval Tasks

ELMo ELMo-PAR

Adversarial SQuAD

PAR improves the

robustness of a

downstream QA

model against

adversarial

examples.

Bi-Directional Attention Flow (BiDAF) [Seo et al. 2017] on two challenge settings

• AddOneSent: add one human-paraphrased sentence

• AddvSent: add one adversarial example sentence that is semantically similar to

the question

AddOneSent AddvSent

F1 scores (%) w/ and w/o PAR

ELMo-BiDAF ELMo-PAR-BiDAF

Word Representations

Paraphrase Non-paraphrase

Average distances of shared words in MRPC test set sentence pairs before and after applying PAR

ELMo (all layers) ELMo-PAR

PAR minimizes the differences of a word’s representations in paraphrased

contexts and preserves the differences in non-paraphrased contexts.

Future Work

Applying PAR on other contextualized embedding models

To modify contextualized word embeddings linguistic knowledge

• Context simplicity aware embeddings

• Incorporating lexical definitions in the word contextualization process

Thank You

Retrofitting Contextualized Word Embeddings with Paraphrases · Retrofitting Contextualized Word...

Transcript of Retrofitting Contextualized Word Embeddings with Paraphrases · Retrofitting Contextualized Word...

Retrofitting Contextualized Word Embeddings with Paraphrases · Retrofitting Contextualized Word...

Documents

Transcript of Retrofitting Contextualized Word Embeddings with Paraphrases · Retrofitting Contextualized Word...

CONTEXTUALIZED STRATEGIC INTERVENTION MATERIALS IN …

Dolores: Deep Contextualized Knowledge Graph Embeddings · Dolores: Deep Contextualized Knowledge Graph Embeddings \random walk". More speci cally, a truncated random walk of length

Deep Contextualized Word Embeddings in Transition-Based ...This has led to substantial accuracy improvements for both transition-based and graph-based parsers and raises the question

Optimizing Rehabilitation Outcomes Through Contextualized ...

Contextualized Customer Journeys

Interpreting noun compounds using paraphrases

Using Paraphrases for Information Extractionshilpaa/shilpa_arora_sma_thesis.pdf · 2007-11-29 · Using Paraphrases for Information Extraction by Shilpa Arora BE in Computer Engineering

Generating Paraphrases of Human Intransitive Adjective … · Generating Paraphrases of Human Intransitive Adjective Constructions with Port4NooJ Cristina Mota1, Paula Carvalho2;3,

From Word Embeddings To Document Distancesmkusner.github.io/presentations/From_Word_Embeddings_To... · 2020-05-16 · From Word Embeddings To Document Distances ... word embeddings

Contextualized Learning Activities

Paraphrases for Textual Inference for pdf

Contextualized Instruction Impacts the Workplace

Contextualized on Newton's third Law

Contextualized Diachronic Word Representations

Quotations, Paraphrases, and Summaries Dr. Karen Petit.

Integrated and Contextualized Instruction - calpro-online.orgcalpro-online.org/NAEAOvideolibrary/presentations/AParkPPT... · Excerpts from CALPRO’s Integrated and Contextualized

Contextualized Weak Supervision for Text Classification · 3 Document Contextualization We leverage contextualized representation tech-niques to create a contextualized corpus. The

[RELO] The Contextualized English Camp

Understanding Event Predictions via Contextualized ...

Paraphrases for Statistical Machine Translation