What’s so Hard about Natural Language Understanding? · What’s so Hard about Natural Language...

What’s so Hard about Natural Language Understanding?

Alan RitterComputer Science and Engineering

The Ohio State University

Collaborators: Jiwei Li, Dan Jurafsky (Stanford)

Bill Dolan, Michel Galley, Jianfeng Gao (MSR), Colin Cherry (Google) Jeniya Tabassum (Ohio State), Alexander Konovalov (Ohio State), Wei Xu (Ohio State)

Brendan O’Connor (Umass)

Q: Why are we so good at Speech, MT (but bad at NLU)?

People naturally translate and transcribe.

• Web-scale Conversations?

• Web-scale Structured Data?

Q: Large, End-to-End Datasets for NLU?



Data-Driven Conversation

6

• Twitter: ~ 500 Million Public SMS-Style Conversations per Month

• Goal: Learn conversational agents directly from massive volumes of data.

Noisy Channel Model

7

[Ritter, Cherry, Dolan EMNLP 2011]

Who wants to come over for dinner tomorrow?Input:

Noisy Channel Model

7



Output:

Yum ! I

{

Noisy Channel Model

7



Output:

{want toYum ! I

{

Noisy Channel Model

7



Output:

{want toYum ! I

{be there

{

Noisy Channel Model

7



Output:

{want toYum ! I

{be there

{tomorrow !

{

Neural Conversation

8

[Sordoni et. al. 2015] [Xu et. al. 2016] [Wen et. al. 2016][Li et. al. 2016] [Kannan et. al. 2016] [Serban et. al. 2016]

9

How old are you?

Slide Credit: Jiwei Li

10

How old are you?

i 'm 16 .


11

How old are you?

i 'm 16 .

16 ?


12

How old are you?

i 'm 16 .

16 ?

i don 't know what you 're talking about


13

How old are you?

i 'm 16 .

16 ?


you don 't know what you 're saying


How old are you?

i 'm 16 .

16 ?




… Slide Credit: Jiwei Li

How old are you?

i 'm 16 .

16 ?





Bad Action

How old are you?

i 'm 16 .

16 ?





Bad Action

Outcome

how old are you

Encoding

State

Deep Reinforcement Learning[Li, Monroe, Ritter, Galley, Gao, Jurafsky EMNLP 2016]

How old are you?

I’m 16 . EOS

Decoding

EOS I’m 16 .how old are you

Encoding

Deep Reinforcement Learning

Action

[Li, Monroe, Ritter, Galley, Gao, Jurafsky EMNLP 2016]

How old are you?

i 'm 16 .

I’m 16 . EOS

Decoding

EOS I’m 16 .how old are you

Encoding

REINFORCE Algorithm (Williams,1992)

What we want to learn

Learning: Policy Gradient

ActionHow old are you?

i 'm 16 .

Q: Rewards?

Q: Rewards?A: Turing Test

Q: Rewards?A: Turing Test

Adversarial Learning (Goodfellow et al., 2014)

Adversarial Learning for Neural Dialogue

Real-world conversations

Response Generator

generate response

samplehuman response

Discriminator Real or Fake?

[Li, Monroe, Shi, Jean, Ritter, Jurafsky EMNLP 2016]



Response Generator

generate response



(Alternate Between Training Generator and Discriminator)




Response Generator

generate response



(Alternate Between Training Generator and Discriminator)

REINFORCE Algorithm (Williams,1992)


Adversarial Learning Improves Response Generation

Human Evaluator:

Machine Evaluator:

Adversarial Success (How often can you fool a machine)

Adversarial Learning 8.0%Standard Seq2Seq model 4.9%

Adversarial Win

Adversarial Lose

Tie

62% 18% 20%

vs vanilla generation model


[Bowman et. al. 2016]





Generates fluent open domain replies






Generates fluent open domain replies

Really Natural Language Understanding?


Learning from Distant Supervision

3) Time Normalization

4) Event Extraction

Challenge: diversity in noisy text

Challenge: lack of negative examples

[Tabassum, Ritter, Xu, EMNLP 2016]

[Ritter, et. al. WWW 2015]

O(✓) =

NX

i

log p✓(yi|xi)

| {z }Log Likelihood

��UD(p̃||p̂unlabeled✓ )| {z }Label regularization

� �L2 X

j

w2

j

| {z }L2

regularization

1) Named Entity RecognitionChallenge: highly ambiguous labels[Ritter, et. al. EMNLP 2011]

2) Relation ExtractionChallenge: missing data[Ritter, et. al. TACL 2013]

[Konovalov, et. al. WWW 2017]

[Mintz et. al. 2009]

Time Normalization[Tabassum, Ritter, Xu EMNLP 2016]

1 Jan 2016

State-of-the-art time resolvers

TempEXHeidelTimeSUTimeUWTime

{ }

Time NormalizationDistant Supervision

(no human labels or rules!)

[Tabassum, Ritter, Xu EMNLP 2016]

1 Jan 2016

State-of-the-art time resolvers

TempEXHeidelTimeSUTimeUWTime

{ }

Mercury Transit May 9,2016

Distant Supervision Assumption

9 May 10 May8 May

Distant Supervision AssumptionMercury Transit

May 9,2016

…w1 w2 w3 wn

[ Mercury, 5/9/2016 ]

Words

[Event Database]

Sentence Level Tags

t1 t2 t3 t41

31

Mon

Sun

1

12

Past Present Future

… … …

Multiple Instance Learning Tagger

…w1 w2 w3 wn

[ Mercury, 5/9/2016 ]

Words

[Event Database]

Sentence Level Tags

t1 t2 t3 t41

31

Mon

Sun

1

12

Past Present Future

… … …

…z1 z2 z3 zn Word Level Tags

Local Classifierexp(✓ · f(wi, zi))


Deterministic OR

…w1 w2 w3 wn

[ Mercury, 5/9/2016 ]

Words

[Event Database]

Sentence Level Tags

t1 t2 t3 t41

31

Mon

Sun

1

12

Past Present Future

… … …



[Hoffmann et. al. 2011]


Deterministic OR

…w1 w2 w3 wn

[ Mercury, 5/9/2016 ]

Maximize ConditionalLikelihood:X

z

P (z, t|w, ✓)

Words

[Event Database]

Sentence Level Tags

t1 t2 t3 t41

31

Mon

Sun

1

12

Past Present Future

… … …



[Hoffmann et. al. 2011]


Sentence Level Tags: TL = Future MOY= May DOM=9 DOW= Mon

Missing Data Problem

AggregatedSentence

Level Tags

…w1 w2 w3 wn

…z1 z2 z3 zn

t1 t2 t3 t4

[Event Database]

Missing Data Extension

…w1 w2 w3 wn

…z1 z2 z3 zn

[Event Database]

t01 t02 t03 t04

m4m3m2m1

Missing Data Problem In Distant Supervision

[Ritter, et. al. TACL 2013]


Mentioned in Text

…w1 w2 w3 wn

…z1 z2 z3 zn

[Event Database]

t01 t02 t03 t04

m4m3m2m1




Mentioned in Text

Implied by Event Date

…w1 w2 w3 wn

…z1 z2 z3 zn

[Event Database]

t01 t02 t03 t04

m4m3m2m1




Mentioned in Text

Encourage Agreement

Implied by Event Date

…w1 w2 w3 wn

…z1 z2 z3 zn

[Event Database]

t01 t02 t03 t04

m4m3m2m1




Example TagsWord Im Hella excited for tomorrowTag NA NA Future NA Future

Word Thnks for a Christmas party on friTag NA NA NA December NA NA Friday

Evaluation

Evaluation17% increase in

F- score over SUTime

Where can we find NLU? Follow the data!


Opportunistically Gathered Data:•Twitter Events (Time Normalization) •Billions of Internet Conversations



Design Models for the Data(rather than the other way around)



Design Models for the Data(rather than the other way around)

Thank You!

What’s so Hard about Natural Language Understanding? · What’s so Hard about Natural Language...

Documents

Transcript of What’s so Hard about Natural Language Understanding? · What’s so Hard about Natural Language...