Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural...

73
Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro Presta, Kuzman Ganchev, Slav Petrov, Michael Collins

Transcript of Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural...

Page 1: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

Globally Normalized Transition-Based Neural Networks

Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro Presta, Kuzman Ganchev, Slav Petrov, Michael Collins

Page 2: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

Parsey McParseface Now Has

40 Multi-lingual Cousins!Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn,

Alessandro Presta, Kuzman Ganchev, Slav Petrov, Michael Collins

Page 3: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

eat pizzaAlice saw Bob

?

Transition-Based Parsing

Stack Buffer

Page 4: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

eat pizzaAlice saw Bob

Transition-Based Parsing

Stack Buffer

RIGHT-ARC

Page 5: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

eat pizzaAlice saw Bob

Transition-Based Parsing

Stack Buffer

LEFT-ARC

Page 6: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

eat pizzaAlice saw Bob

Stack Buffer

SHIFT

Transition-Based Parsing

Page 7: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

eat pizzaAlice saw Bob

?

Stack Buffer

Transition-Based Neural Networks

Page 8: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

4Embeddings

eat pizzaAlice saw Bob

?

Stack Buffer

Transition-Based Neural Networks

Page 9: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

4

3

Embeddings

eat pizzaAlice saw Bob

?

ReLU 1

Stack Buffer

Transition-Based Neural Networks

Page 10: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

4

3

2

Embeddings

eat pizzaAlice saw Bob

?

ReLU 1

ReLU 2

Stack Buffer

Transition-Based Neural Networks

Page 11: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

1

4

3

2

Embeddings

eat pizzaAlice saw Bob

?

ReLU 1

ReLU 2

Activations

Stack Buffer

Transition-Based Neural Networks

Page 12: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

1

4

3

2

Embeddings

eat pizzaAlice saw Bob

?

ReLU 1

ReLU 2

Activations

Action Softmax

Stack Buffer

Transition-Based Neural Networks

Page 13: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

1

4

3

2

Embeddings

eat pizzaAlice saw Bob

?

ReLU 1

ReLU 2

Activations

Action Softmax

P (action|context)

Locally normalized

model:

Stack Buffer

Transition-Based Neural Networks

Page 14: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

1

4

3

2

Embeddings

eat pizzaAlice saw Bob

?

ReLU 1

ReLU 2

Activations

Action Softmax

P (action|context)

Locally normalized

model:

Stack Buffer

Transition-Based Neural Networks

• Locally normalized models are often easy to train

• Globally normalized models using the same #params can be much more accurate

• Applies to multiple tasks

Page 15: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

1

4

3

2

Embeddings

eat pizzaAlice saw Bob

?

ReLU 1

ReLU 2

Activations

Action Softmax

P (action|context)

Locally normalized

model:

Stack Buffer

Transition-Based Neural Networks

Page 16: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

Alice saw Bob eat pizza with Charlie

Locally Normalized Training

[Chen & Manning ’14, Weiss et al. ’15]

Page 17: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

Locally Normalized Training

[Chen & Manning ’14, Weiss et al. ’15]

Oracle maps gold structures to gold action sequences:

Gold sentences

Page 18: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

Locally Normalized Training

[Chen & Manning ’14, Weiss et al. ’15]

Oracle maps gold structures to gold action sequences:

Gold sentences

Page 19: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

Gold sentences

Locally Normalized Training

[Chen & Manning ’14, Weiss et al. ’15]

Oracle maps gold structures to gold action sequences:

Page 20: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

Mini-batches

Locally Normalized Training

[Chen & Manning ’14, Weiss et al. ’15]

Page 21: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

Mini-batches

Locally Normalized Training

Some advantages: • Trivially Parallelizable • SGD Training recipes • Standard NN Packages

[Chen & Manning ’14, Weiss et al. ’15]

Page 22: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

Alice saw Bob eat pizza with Charlie

Locally Normalized Inference

Page 23: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

Alice saw Bob eat pizza with Charlie?

How Important is Lookahead?

Page 24: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

Alice saw Bob eat pizza with Charlie

UAS

(§2

2 of

the

WSJ

)

75

80

85

90

95

0 1 2 3 4

Local

?

How Important is Lookahead?

Page 25: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

Alice saw Bob eat pizza with Charlie

UAS

(§2

2 of

the

WSJ

)

75

80

85

90

95

0 1 2 3 4

Local

?

How Important is Lookahead?

Page 26: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

UAS

75

80

85

90

95

0 1 2 3 4

Local

Alice saw Bob eat pizza with Charlie?

How Important is Lookahead?

Page 27: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

UAS

75

80

85

90

95

0 1 2 3 4

Local

Alice saw Bob eat pizza with Charlie?

How Important is Lookahead?

Page 28: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

Alice saw Bob eat pizza with Charlie

UAS

75

80

85

90

95

0 1 2 3 4

Local

?

How Important is Lookahead?

Page 29: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

UAS

75

80

85

90

95

0 1 2 3 4

Local

How Important is Lookahead?

Alice saw Bob eat pizza with Charlie?

Page 30: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

UAS

75

80

85

90

95

0 1 2 3 4

Local

How Important is Lookahead?

Alice saw Bob eat pizza with CharlieBi-LSTM

Page 31: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

UAS

75

80

85

90

95

0 1 2 3 4

Local

How Important is Lookahead?

LSTM [Kiperwasser & Goldberg '16]

Alice saw Bob eat pizza with CharlieBi-LSTM

Page 32: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

Beam

Beam Search with Local Model

Alice saw Bob eat pizza with Charlie(Schematic)

Bett

er

Page 33: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

Beam

Beam Search with Local Model

Alice saw Bob eat pizza with Charlie(Schematic)

Bett

er

Page 34: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

Beam

Beam Search with Local Model

Alice saw Bob eat pizza with Charlie(Schematic)

Bett

er

Page 35: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

Beam

Beam Search with Local Model

Alice saw Bob eat pizza with Charlie(Schematic)

Bett

er

Page 36: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

Beam

Beam Search with Local Model

Alice saw Bob eat pizza with Charlie(Schematic)

Bett

er

Page 37: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

Beam Search with Local Model

UAS

75

80

85

90

95

Lookahead

0 1 2 3 4

Local +Beam

Page 38: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

Beam Search with Local Model

UAS

75

80

85

90

95

Lookahead

0 1 2 3 4

Local +Beam

Page 39: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

Beam

Training with Early Updates

[Collins and Roark ’04, Zhou et al.’15]

Page 40: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

Beam

Training with Early Updates

[Collins and Roark ’04, Zhou et al.’15]

Page 41: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

Beam

Training with Early Updates

[Collins and Roark ’04, Zhou et al.’15]

Page 42: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

Beam

Training with Early Updates

[Collins and Roark ’04, Zhou et al.’15]

Page 43: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

Beam

Training with Early Updates

Globally normalized with respect to the beam:

[Collins and Roark ’04, Zhou et al.’15]

BACKPROP

Backpropagate through all steps, paths, and layers

exp

Pi �

(⇤)iP|Beam|

j=1 exp

Pi �

(j)iX

i

�(4)i

X

i

�(3)i

X

i

�(2)i

X

i

�(1)i

X

i

�(⇤)i

Page 44: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

Beam

Training with Early Updates

Globally normalized with respect to the beam:

[Collins and Roark ’04, Zhou et al.’15]

BACKPROP

Backpropagate through all steps, paths, and layers

exp

Pi �

(⇤)iP|Beam|

j=1 exp

Pi �

(j)iX

i

�(4)i

X

i

�(3)i

X

i

�(2)i

X

i

�(1)i

X

i

�(⇤)i

Page 45: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

Beam

Training with Early Updates

Globally normalized with respect to the beam:

[Collins and Roark ’04, Zhou et al.’15]

BACKPROP

Backpropagate through all steps, paths, and layers

exp

Pi �

(⇤)iP|Beam|

j=1 exp

Pi �

(j)iX

i

�(4)i

X

i

�(3)i

X

i

�(2)i

X

i

�(1)i

X

i

�(⇤)i

Page 46: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

Beam

Training with Early Updates

Globally normalized with respect to the beam:

[Collins and Roark ’04, Zhou et al.’15]

BACKPROP

Backpropagate through all steps, paths, and layers

exp

Pi �

(⇤)iP|Beam|

j=1 exp

Pi �

(j)iX

i

�(4)i

X

i

�(3)i

X

i

�(2)i

X

i

�(1)i

X

i

�(⇤)i

Page 47: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

Beam

Training with Early Updates

Globally normalized with respect to the beam:

[Collins and Roark ’04, Zhou et al.’15]

BACKPROP

Backpropagate through all steps, paths, and layers

exp

Pi �

(⇤)iP|Beam|

j=1 exp

Pi �

(j)iX

i

�(4)i

X

i

�(3)i

X

i

�(2)i

X

i

�(1)i

X

i

�(⇤)i

Page 48: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

Beam

Training with Early Updates

Globally normalized with respect to the beam:

[Collins and Roark ’04, Zhou et al.’15]

BACKPROP

Backpropagate through all steps, paths, and layers

exp

Pi �

(⇤)iP|Beam|

j=1 exp

Pi �

(j)iX

i

�(4)i

X

i

�(3)i

X

i

�(2)i

X

i

�(1)i

X

i

�(⇤)i

Page 49: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

Globally Normalized Model

UAS

75

80

85

90

95

Lookahead

0 1 2 3 4

Local +Beam Global

Page 50: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

Globally Normalized Model

UAS

75

80

85

90

95

Lookahead

0 1 2 3 4

Local +Beam Global

Page 51: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

English WSJ ResultsU

AS

90

91

92

93

94

9594.61

93.9093.99

92.83

93.2093.19

91.80

93.0093.22

This

Wor

k: G

loba

l (su

perv

ised

)

NN

Per

cept

ron

(Wei

ss e

t al

. ’1

5)

Zhou

et

al.

‘15

LSTM

(D

yer

et a

l ’15

)

Zhan

g &

McD

onal

d ‘1

4

Zhan

g &

Niv

re ‘

11

Loca

l (W

eiss

et

al.

’15)

Chen

&

Man

ning

’14

LSTM

(Ki

perw

asse

r &

Gol

dber

g ‘1

6)

Page 52: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

CoNLL’09 POS Tagging and Parsing ResultsAc

cura

cy

94

96

98

100

Ca Ch Cz En Ge Jp Sp

LSTM (Ling et al. '15)This Work

UAS

80

85

90

95

Ca Ch Cz En Ge Jp Sp

Bohnet and Nivre '12Alberti et al. '15This Work

ParsingTagging

Page 53: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

In Pakistan, former leader Pervez Musharraf has appeared in court for the first time, on treason charges.

Sentence Compression Results

Page 54: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

In Pakistan, former leader Pervez Musharraf has appeared in court for the first time, on treason charges.

Transition System decides to KEEP or DROP words

Sentence Compression Results

Page 55: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

In Pakistan, former leader Pervez Musharraf has appeared in court for the first time, on treason charges.

Transition System decides to KEEP or DROP words

Sentence Compression Results

Page 56: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

Pervez Musharraf has appeared in court on treason charges.

Sentence Compression Results

Page 57: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

Pervez Musharraf has appeared in court on treason charges.

Sentence Compression Results

Whole-sentence test accuracy

Human eval rating

Relative throughput

35.36

4.66

1x

35.16

4.67

100x

Seq2seq LSTM (Filippova et al. ’15)

Global model (This work)

Page 58: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

Pervez Musharraf has appeared in court on treason charges.

Sentence Compression Results

Whole-sentence test accuracy

Human eval rating

Relative throughput

35.36

4.66

1x

35.16

4.67

100x

Seq2seq LSTM (Filippova et al. ’15)

Global model (This work)

Page 59: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

In Pakistan, former leader Pervez Musharraf has appeared in court for the first time, on treason charges.

In Pakistan, former leader Pervez Musharraf has appeared in court for the first time, on treason charges.

In Pakistan, former leader Pervez Musharraf has appeared in court for the first time, on treason charges.

Local

+Beam

Global

Predicted compression Sequence probability under Local Global

0.13 0.05

0.16 <10-4

0.06 0.07

Sentence Compression: Label Bias

Page 60: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

Why does it work?

Page 61: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

1. Global Models are More Expressive

Let • set of distributions under a Local model • set of distributions under a Global model

Theorem:

Therefore there are some distributions over sequences that cannot be captured in a finite-lookahead locally-normalized model.

PL

PG

PL

[This work, Smith and Johnson ’07]

PL ( PG

PG

Page 62: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

2. Backprop with a Beam

Page 63: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

2. Backprop with a BeamU

AS

92

93

94

95

93.32

92.85

Gre

edy

+Bea

m

Local training

Page 64: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

2. Backprop with a BeamU

AS

92

93

94

95

93.4593.32

92.85

Activations

ReLU 1

ReLU 2

Embeddings

Gre

edy

+Bea

m

Trai

n on

ly

Acti

vati

ons

Local training

Global training

Page 65: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

2. Backprop with a BeamU

AS

92

93

94

95

94.01

93.4593.32

92.85

Activations

ReLU 1

ReLU 2

Embeddings

Gre

edy

+Bea

m

Trai

n on

ly

Acti

vati

ons

+ReL

U 2

Local training

Global training

Page 66: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

2. Backprop with a BeamU

AS

92

93

94

95

94.0994.01

93.4593.32

92.85

Activations

ReLU 1

ReLU 2

Embeddings

Gre

edy

+Bea

m

Trai

n on

ly

Acti

vati

ons

+ReL

U 2

+ReL

U 1

Local training

Global training

Page 67: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

2. Backprop with a BeamU

AS

92

93

94

95

94.3894.0994.01

93.4593.32

92.85

Activations

ReLU 1

ReLU 2

Embeddings

Gre

edy

+Bea

m

Trai

n on

ly

Acti

vati

ons

+ReL

U 2

+ReL

U 1

+Em

bedd

ings

Local training

Global training

Page 68: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

Conclusions

Global models:

• can be taught to do search better

• more accurate, in exchange for more training time

• same wicked fast decoding

• applicable to multiple tasks

Page 69: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

Open Source: SyntaxNet

Parsey McParseface + 40 languages

https://github.com/tensorflow/models/tree/master/syntaxnet

Page 70: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

ACL 2016 Google Booth

And check out the Natural Language Understanding

team page: g.co/NLUTeam

Come by for demos, info and swag

Page 71: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

Thank You!

[Nivre ’06] [Nivre ’09]

[Bohnet and Nivre ’12] [Martins et al.’13]

[Chen and Manning ’14] [Zhang and McDonald ’14]

[Alberti et al.’15] [Ballesteros et al.’15]

[Dyer et al.’15] [Weiss et al.’15]

[Yazdani and Henderson ’15] [Zhou et al.’15]

[Vaswani and Sagae ’16]

[Henderson ’03] [Henderson ’04]

[Durrett and Klein ’15] [Vinyals et al.’15]

[Watanabe and Sumita ’15]

[Ross et al.’11] [Yao et al.’14]

[Zheng et al.’15] [Zhou and Xu’15][Lei et al.’14]

[Ling et al.’15] [Peng et al.’09]

[Do and Artires ’10] [Filippova et al.’15]

[Goldberg and Nivre ’13] [Hochreiter and Schmidhuber ’97]

[Huang et al.’15]

[Collins and Roark ’04] [Collins ’99]

[Liang et al.’08] [Daume III et al.’09]

[Abney et al.’99] [Chi ’99]

[Smith and Johnson ’07]

[Bottou ’91] [Bottou et al.’97]

[Lafferty et al.’01] [Bottou and LeCun ’05]

[Le Cun et al.’98]

Page 72: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

Appendix

Page 73: Globally Normalized Transition-Based Neural Networks · Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro

Longer examples of ambiguity