Concavity and Initialization for Unsupervised Dependency ...kgimpel/talks/gimpel... · Unsupervised...

Concavity and Initialization for Unsupervised Dependency Parsing

Kevin Gimpel Noah A. Smith

Unsupervised learning in NLP non-convex optimization

(typically)

-20.2 -20 -19.8 -19.6 -19.4 -19.2 -1910

Log-Likelihood (per sentence)

Dependency Model with Valence (Klein & Manning, 2004)

EM with 50 Random

Initializers

-20.2 -20 -19.8 -19.6 -19.4 -19.2 -1910

Log-Likelihood (per sentence)4

Pearson’s r = 0.63(strong correlation)

-20.2 -20 -19.8 -19.6 -19.4 -19.2 -1910

Range = 20%!

-20.2 -20 -19.8 -19.6 -19.4 -19.2 -1910

initializer from K&M04

How has this been addressed?

� Scaffolding / staged training (Brown et al., 1993; Elman, 1993; Spitkovsky et al., 2010)

� Curriculum learning (Bengio et al., 2009)

� Deterministic annealing (Smith & Eisner, 2004), Structural annealing (Smith & Eisner, 2006)

� Continuation methods (Allgower & Georg, 1990)

Example: Word Alignment

IBM Model 1

HMM Model

IBM Model 4

Brown et al. (1993)

Example: Word Alignment

IBM Model 1

HMM Model

IBM Model 4

Brown et al. (1993)CONCAVE

(typically)

Except IBM Model 1 for word alignment (which has a concave log-likelihood function)

(typically)

IBM Model 1 (Brown et al., 1993)

alignment probability

translation probability

IBM Model 2

CONCAVE

NOT CONCAVE

IBM Model 2

CONCAVE

NOT CONCAVE

product of parameters within log-sum

IBM Model 2

CONCAVE

NOT CONCAVE

product of parameters within log-sum

For concavity:

1 parameter is permitted for each atomic piece of latent structure.

No atomic piece of latent structure can affect any other piece.

What models can we build without sacrificing concavity?

(typically)

lti 19

For concavity:

lti 20

For concavity:

single dependency arc

lti 21

For concavity:

Every dependency arc must be independent, so we can’t use a tree constraint

lti 22

For concavity:

Every dependency arc must be independent, so we can’t use a tree constraint

Only one parameter allowed per dependency arc

lti 23

For concavity:

Our Model:

Like IBM Model 1, but we generate the same sentence again, aligning words to the original sentence (cf. Brody, 2010)

lti 24

$ Vikings came in longboats from Scandinavia in 1000 AD

lti 25

lti 26

lti 27

lti 28

lti 29

Cycles, multiple roots, and non-projectivityare all permitted by this model

lti 30

Only one parameter per dependency arc:

lti 31

lti 32

We cannot look at other dependency arcs, but we can condition on (properties of) the sentence:

lti 33

We condition on direction:

(“Concave Model A”)

lti 34

$ NNPS VBD IN NNS IN NNP IN CD NN

$ NNPS VBD IN NNSfrom Scandinavia in 1000 AD

Note: we’ve been using words in our examples, but in our model we follow standard practice and use gold POS tags

lti 35

Model Initializer Accuracy*

Attach Right N/A 31.7

DMV Uniform 17.6

DMV K&M 32.9

Concave Model A Uniform 25.6

*Penn Treebank test set, sentences of all lengths

WSJ10 used for training

lti 36

DMV Uniform 17.6

DMV K&M 32.9

IBM Model 1 is not strictly concave (Toutanova & Galley, 2011)

lti 37

We can also use hard constraints while preserving concavity:

The only tags that can align to $ are verbs(Marecček & Žabokrtský, 2011; Naseem et al., 2010)

(“Concave Model B”)

lti 38

DMV Uniform 17.6

DMV K&M 32.9

Concave Model B Uniform 28.6

Can these concave models be useful?

(typically)

lti 40

As IBM Model 1 is used to initialize other word alignment models, we can use our concave models to initialize the DMV

lti 41

Model Initializer Accuracy

DMV Uniform 17.6

DMV K&M 32.9

DMV Concave Model A 34.4

DMV Concave Model B 43.0

lti 42

DMV, trained on sentences of length ≤ 20 Concave Model B 53.1

Shared Logistic Normal (Cohen & Smith, 2009) K&M 41.4

Posterior Regularization (Gillenwater et al., 2010) K&M 53.3

LexTSG-DMV (Blunsom & Cohn, 2010) K&M 55.7

Punctuation/UnsupTags (Spitkovsky et al., 2011), trained on sentences of length ≤ 45

K&M’ 59.1

Multilingual Results(averages across 18 languages)

Model Initializer Avg. Accuracy* Avg. Log-Likelihood †

DMV Uniform 25.7 -15.05

DMV K&M 29.4 -14.84

DMV Concave Model A 30.9 -14.93

DMV Concave Model B 35.5 -14.45

* Sentences of all lengths from each test set

† Micro-averaged across sentences in all training sets (used sentences ≤ 10 words for training)

Can these concave models be useful?

Like word alignment, we can use simple, concave models to initialize more complex models for grammar induction

(typically)

lti 45

Thanks!

Concavity and Initialization for Unsupervised Dependency ...kgimpel/talks/gimpel... · Unsupervised...

Documents

Transcript of Concavity and Initialization for Unsupervised Dependency ...kgimpel/talks/gimpel... · Unsupervised...

Exploiting Reducibility in Unsupervised Dependency Parsing David Mareček and Zdeněk Žabokrtský Institute of Formal and Applied Linguistics Charles University.

From Baby Steps to Leapfrog: How “Less is More” Baby Steps to Leapfrog: How “Less is More” in Unsupervised Dependency Parsing Valentin I. Spitkovsky withHiyan Alshawi (GoogleInc.)

Gibbs Sampling with Treenes constraint in Unsupervised Dependency Parsing

CS11-747 Neural Networks for NLP Unsupervised and Semi ...phontron.com/class/nn4nlp2017/assets/slides/nn4nlp-18-unsup.pdf · Learning Dependency Heads w/ Attention (Kuncoro et al.

Unsupervised Dependency Parsing

Unsupervised Approaches to Sequence Tagging, Morphology ...rezab/papers/unsupnlp_slides.pdf · Grenager & Manning (2006) • From dependency parses, a generative model predicts for

DIGITAL TRANSFORMATION: CHANGES AND …...2 DIGITAL TRANSFORMATION CHANGES AND CHANCES – INSIGHTS BASED ON AN EMPIRICAL STUDY Prof. Dr. Henner Gimpel Prof. Dr. Gimpel belongs to

OOTTV - BFIM · 2017-02-22 · OOTTV Gimpel™ Oil Operated Trip Throttle Valve (Inverted Globe Body) Dresser-Rand acquired the Gimpel valve business in April, 2007. Gimpel products

gimpel - the medieval machine, the industrial revolution of the middle ages

Gimpel EHTTV Technical Description - BFIM - Binzagr · PDF file · 2017-02-22The actuator is installed on the valve, while the control panel is ... Microsoft Word - Gimpel EHTTV Technical

pymes Santiago Gloria Gimpel

A Framework for Unsupervised Dependency Parsing using a Soft … Framework for... · 2013. 12. 8. · Keywords: Unsupervised dependency parsing, bilexical grammars, soft-EM al-gorithm.

Posterior Sparsity in Unsupervised Dependency Parsing

Using Unsupervised Morphological Segmentation to Improve ...uu.diva-portal.org/smash/get/diva2:1221345/FULLTEXT01.pdfUsing Unsupervised Morphological Segmentation to Improve Dependency

Methods in Unsupervised Dependency Parsing

Punctuation: Making a Point - Stanford NLP GroupPunctuation: Making a Point in Unsupervised Dependency Parsing Valentin I. Spitkovsky withDaniel Jurafsky (StanfordUniversity) andHiyan

Unsupervised Dependency Parsing - cuni.czmarecek/papers/2012_marecek... · 2012. 10. 2. · Abstract Unsupervised dependency parsing is an alternative approach to identifying rela-tions

Unsupervised Learning Learning Unsupervised...Unsupervised Learning and Data Mining Learning Mining Unsupervised Learning and Data Mining Unsupervised Data Clustering Supervised Learning

I gotta dependency on dependency injection

Unsupervised Neural Dependency Parsing · Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 763–771, Austin, Texas, November 1-5, 2016.