Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep...

123
Transition Based Dependency Parsing with Deep Learning ¨ Omer Kırnap Ko¸cUniversity [email protected] September 27, 2018 ¨ Omer Kırnap (Ko¸cUniversity) MSc Thesis September 27, 2018 1 / 123

Transcript of Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep...

Page 1: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Transition Based Dependency Parsing with DeepLearning

Omer Kırnap

Koc University

okirnapkuedutr

September 27 2018

Omer Kırnap (Koc University) MSc Thesis September 27 2018 1 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 2 123

1 Introduction

Omer Kırnap (Koc University) MSc Thesis September 27 2018 3 123

Introduction

What is dependency parsing

Dependency parsing aims to detect word relations by finding the treestructure of a sentence inspired by dependency grammar

Figure Dependency annotations for a sentence ldquo Economic news had little effecton financial marketsrdquo

1

1Figure from S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Omer Kırnap (Koc University) MSc Thesis September 27 2018 4 123

Introduction

Why do we need dependency parsing

Dependencies resolve ambiguity

Useful for some down-stream tasks in NLP

2

2Figure from httpwwwphontroncomslidesnlp-programming-en-11-dependpdfOmer Kırnap (Koc University) MSc Thesis September 27 2018 5 123

Introduction

Dependency Parsing Categorization

Grammar BasedRelying on a formal grammardefining a formal languageasking whether a given inputsentence is in the languagedefined by the grammar or not

Data-drivenMaking essential use of machinelearning from linguistic data in orderto parse new sentences

3

3From S Kbler R McDonald and J Nivre 2009 Dependency parsing Morgan ampClaypool US

Omer Kırnap (Koc University) MSc Thesis September 27 2018 6 123

Introduction

Data-driven Dependency Parsing

Graph Based Algorithms

Using maximum spanning tree algorithms from graph theory

Transition Based Algorithms

Capitalizing on greedy stack based algorithms to build dependency treewith incremental steps in linear time

4

4From S Kbler R McDonald and J Nivre 2009 Dependency parsing Morgan ampClaypool US

Omer Kırnap (Koc University) MSc Thesis September 27 2018 7 123

Introduction

Transition Based Dependency Parsing

Transition System Abstract machine with a set of configurations(states) and transitions We use the ArcHybrid transition system[Kuhlmann et al 2011]

Configurations (σ β A)bull σ Stack of tree fragments initially emptybull β Buffer of words initially containing the whole sentencebull A Set of dependency arcs (head relation modifier) initially empty

Transitionsbull shift(σ b|βA) = (σ|b βA)bull leftd(σ|s b|βA) = (σ b|βA cup (b d s))bull rightd(σ|s|t βA) = (σ|s βA cup (s d t))

Omer Kırnap (Koc University) MSc Thesis September 27 2018 8 123

An example parsing of a sentence

Omer Kırnap (Koc University) MSc Thesis September 27 2018 9 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 10 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 11 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 12 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 13 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 14 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 15 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 16 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 17 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 18 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 19 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123

Problem Definition

Find a model learning to decide correct transition from current state

Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123

2 Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123

Related Work

Neural Networks for Feature Conjunctions

Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 2: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 2 123

1 Introduction

Omer Kırnap (Koc University) MSc Thesis September 27 2018 3 123

Introduction

What is dependency parsing

Dependency parsing aims to detect word relations by finding the treestructure of a sentence inspired by dependency grammar

Figure Dependency annotations for a sentence ldquo Economic news had little effecton financial marketsrdquo

1

1Figure from S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Omer Kırnap (Koc University) MSc Thesis September 27 2018 4 123

Introduction

Why do we need dependency parsing

Dependencies resolve ambiguity

Useful for some down-stream tasks in NLP

2

2Figure from httpwwwphontroncomslidesnlp-programming-en-11-dependpdfOmer Kırnap (Koc University) MSc Thesis September 27 2018 5 123

Introduction

Dependency Parsing Categorization

Grammar BasedRelying on a formal grammardefining a formal languageasking whether a given inputsentence is in the languagedefined by the grammar or not

Data-drivenMaking essential use of machinelearning from linguistic data in orderto parse new sentences

3

3From S Kbler R McDonald and J Nivre 2009 Dependency parsing Morgan ampClaypool US

Omer Kırnap (Koc University) MSc Thesis September 27 2018 6 123

Introduction

Data-driven Dependency Parsing

Graph Based Algorithms

Using maximum spanning tree algorithms from graph theory

Transition Based Algorithms

Capitalizing on greedy stack based algorithms to build dependency treewith incremental steps in linear time

4

4From S Kbler R McDonald and J Nivre 2009 Dependency parsing Morgan ampClaypool US

Omer Kırnap (Koc University) MSc Thesis September 27 2018 7 123

Introduction

Transition Based Dependency Parsing

Transition System Abstract machine with a set of configurations(states) and transitions We use the ArcHybrid transition system[Kuhlmann et al 2011]

Configurations (σ β A)bull σ Stack of tree fragments initially emptybull β Buffer of words initially containing the whole sentencebull A Set of dependency arcs (head relation modifier) initially empty

Transitionsbull shift(σ b|βA) = (σ|b βA)bull leftd(σ|s b|βA) = (σ b|βA cup (b d s))bull rightd(σ|s|t βA) = (σ|s βA cup (s d t))

Omer Kırnap (Koc University) MSc Thesis September 27 2018 8 123

An example parsing of a sentence

Omer Kırnap (Koc University) MSc Thesis September 27 2018 9 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 10 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 11 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 12 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 13 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 14 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 15 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 16 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 17 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 18 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 19 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123

Problem Definition

Find a model learning to decide correct transition from current state

Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123

2 Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123

Related Work

Neural Networks for Feature Conjunctions

Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 3: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

1 Introduction

Omer Kırnap (Koc University) MSc Thesis September 27 2018 3 123

Introduction

What is dependency parsing

Dependency parsing aims to detect word relations by finding the treestructure of a sentence inspired by dependency grammar

Figure Dependency annotations for a sentence ldquo Economic news had little effecton financial marketsrdquo

1

1Figure from S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Omer Kırnap (Koc University) MSc Thesis September 27 2018 4 123

Introduction

Why do we need dependency parsing

Dependencies resolve ambiguity

Useful for some down-stream tasks in NLP

2

2Figure from httpwwwphontroncomslidesnlp-programming-en-11-dependpdfOmer Kırnap (Koc University) MSc Thesis September 27 2018 5 123

Introduction

Dependency Parsing Categorization

Grammar BasedRelying on a formal grammardefining a formal languageasking whether a given inputsentence is in the languagedefined by the grammar or not

Data-drivenMaking essential use of machinelearning from linguistic data in orderto parse new sentences

3

3From S Kbler R McDonald and J Nivre 2009 Dependency parsing Morgan ampClaypool US

Omer Kırnap (Koc University) MSc Thesis September 27 2018 6 123

Introduction

Data-driven Dependency Parsing

Graph Based Algorithms

Using maximum spanning tree algorithms from graph theory

Transition Based Algorithms

Capitalizing on greedy stack based algorithms to build dependency treewith incremental steps in linear time

4

4From S Kbler R McDonald and J Nivre 2009 Dependency parsing Morgan ampClaypool US

Omer Kırnap (Koc University) MSc Thesis September 27 2018 7 123

Introduction

Transition Based Dependency Parsing

Transition System Abstract machine with a set of configurations(states) and transitions We use the ArcHybrid transition system[Kuhlmann et al 2011]

Configurations (σ β A)bull σ Stack of tree fragments initially emptybull β Buffer of words initially containing the whole sentencebull A Set of dependency arcs (head relation modifier) initially empty

Transitionsbull shift(σ b|βA) = (σ|b βA)bull leftd(σ|s b|βA) = (σ b|βA cup (b d s))bull rightd(σ|s|t βA) = (σ|s βA cup (s d t))

Omer Kırnap (Koc University) MSc Thesis September 27 2018 8 123

An example parsing of a sentence

Omer Kırnap (Koc University) MSc Thesis September 27 2018 9 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 10 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 11 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 12 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 13 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 14 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 15 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 16 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 17 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 18 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 19 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123

Problem Definition

Find a model learning to decide correct transition from current state

Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123

2 Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123

Related Work

Neural Networks for Feature Conjunctions

Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 4: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Introduction

What is dependency parsing

Dependency parsing aims to detect word relations by finding the treestructure of a sentence inspired by dependency grammar

Figure Dependency annotations for a sentence ldquo Economic news had little effecton financial marketsrdquo

1

1Figure from S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Omer Kırnap (Koc University) MSc Thesis September 27 2018 4 123

Introduction

Why do we need dependency parsing

Dependencies resolve ambiguity

Useful for some down-stream tasks in NLP

2

2Figure from httpwwwphontroncomslidesnlp-programming-en-11-dependpdfOmer Kırnap (Koc University) MSc Thesis September 27 2018 5 123

Introduction

Dependency Parsing Categorization

Grammar BasedRelying on a formal grammardefining a formal languageasking whether a given inputsentence is in the languagedefined by the grammar or not

Data-drivenMaking essential use of machinelearning from linguistic data in orderto parse new sentences

3

3From S Kbler R McDonald and J Nivre 2009 Dependency parsing Morgan ampClaypool US

Omer Kırnap (Koc University) MSc Thesis September 27 2018 6 123

Introduction

Data-driven Dependency Parsing

Graph Based Algorithms

Using maximum spanning tree algorithms from graph theory

Transition Based Algorithms

Capitalizing on greedy stack based algorithms to build dependency treewith incremental steps in linear time

4

4From S Kbler R McDonald and J Nivre 2009 Dependency parsing Morgan ampClaypool US

Omer Kırnap (Koc University) MSc Thesis September 27 2018 7 123

Introduction

Transition Based Dependency Parsing

Transition System Abstract machine with a set of configurations(states) and transitions We use the ArcHybrid transition system[Kuhlmann et al 2011]

Configurations (σ β A)bull σ Stack of tree fragments initially emptybull β Buffer of words initially containing the whole sentencebull A Set of dependency arcs (head relation modifier) initially empty

Transitionsbull shift(σ b|βA) = (σ|b βA)bull leftd(σ|s b|βA) = (σ b|βA cup (b d s))bull rightd(σ|s|t βA) = (σ|s βA cup (s d t))

Omer Kırnap (Koc University) MSc Thesis September 27 2018 8 123

An example parsing of a sentence

Omer Kırnap (Koc University) MSc Thesis September 27 2018 9 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 10 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 11 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 12 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 13 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 14 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 15 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 16 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 17 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 18 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 19 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123

Problem Definition

Find a model learning to decide correct transition from current state

Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123

2 Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123

Related Work

Neural Networks for Feature Conjunctions

Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 5: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Introduction

Why do we need dependency parsing

Dependencies resolve ambiguity

Useful for some down-stream tasks in NLP

2

2Figure from httpwwwphontroncomslidesnlp-programming-en-11-dependpdfOmer Kırnap (Koc University) MSc Thesis September 27 2018 5 123

Introduction

Dependency Parsing Categorization

Grammar BasedRelying on a formal grammardefining a formal languageasking whether a given inputsentence is in the languagedefined by the grammar or not

Data-drivenMaking essential use of machinelearning from linguistic data in orderto parse new sentences

3

3From S Kbler R McDonald and J Nivre 2009 Dependency parsing Morgan ampClaypool US

Omer Kırnap (Koc University) MSc Thesis September 27 2018 6 123

Introduction

Data-driven Dependency Parsing

Graph Based Algorithms

Using maximum spanning tree algorithms from graph theory

Transition Based Algorithms

Capitalizing on greedy stack based algorithms to build dependency treewith incremental steps in linear time

4

4From S Kbler R McDonald and J Nivre 2009 Dependency parsing Morgan ampClaypool US

Omer Kırnap (Koc University) MSc Thesis September 27 2018 7 123

Introduction

Transition Based Dependency Parsing

Transition System Abstract machine with a set of configurations(states) and transitions We use the ArcHybrid transition system[Kuhlmann et al 2011]

Configurations (σ β A)bull σ Stack of tree fragments initially emptybull β Buffer of words initially containing the whole sentencebull A Set of dependency arcs (head relation modifier) initially empty

Transitionsbull shift(σ b|βA) = (σ|b βA)bull leftd(σ|s b|βA) = (σ b|βA cup (b d s))bull rightd(σ|s|t βA) = (σ|s βA cup (s d t))

Omer Kırnap (Koc University) MSc Thesis September 27 2018 8 123

An example parsing of a sentence

Omer Kırnap (Koc University) MSc Thesis September 27 2018 9 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 10 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 11 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 12 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 13 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 14 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 15 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 16 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 17 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 18 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 19 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123

Problem Definition

Find a model learning to decide correct transition from current state

Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123

2 Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123

Related Work

Neural Networks for Feature Conjunctions

Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 6: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Introduction

Dependency Parsing Categorization

Grammar BasedRelying on a formal grammardefining a formal languageasking whether a given inputsentence is in the languagedefined by the grammar or not

Data-drivenMaking essential use of machinelearning from linguistic data in orderto parse new sentences

3

3From S Kbler R McDonald and J Nivre 2009 Dependency parsing Morgan ampClaypool US

Omer Kırnap (Koc University) MSc Thesis September 27 2018 6 123

Introduction

Data-driven Dependency Parsing

Graph Based Algorithms

Using maximum spanning tree algorithms from graph theory

Transition Based Algorithms

Capitalizing on greedy stack based algorithms to build dependency treewith incremental steps in linear time

4

4From S Kbler R McDonald and J Nivre 2009 Dependency parsing Morgan ampClaypool US

Omer Kırnap (Koc University) MSc Thesis September 27 2018 7 123

Introduction

Transition Based Dependency Parsing

Transition System Abstract machine with a set of configurations(states) and transitions We use the ArcHybrid transition system[Kuhlmann et al 2011]

Configurations (σ β A)bull σ Stack of tree fragments initially emptybull β Buffer of words initially containing the whole sentencebull A Set of dependency arcs (head relation modifier) initially empty

Transitionsbull shift(σ b|βA) = (σ|b βA)bull leftd(σ|s b|βA) = (σ b|βA cup (b d s))bull rightd(σ|s|t βA) = (σ|s βA cup (s d t))

Omer Kırnap (Koc University) MSc Thesis September 27 2018 8 123

An example parsing of a sentence

Omer Kırnap (Koc University) MSc Thesis September 27 2018 9 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 10 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 11 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 12 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 13 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 14 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 15 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 16 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 17 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 18 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 19 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123

Problem Definition

Find a model learning to decide correct transition from current state

Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123

2 Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123

Related Work

Neural Networks for Feature Conjunctions

Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 7: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Introduction

Data-driven Dependency Parsing

Graph Based Algorithms

Using maximum spanning tree algorithms from graph theory

Transition Based Algorithms

Capitalizing on greedy stack based algorithms to build dependency treewith incremental steps in linear time

4

4From S Kbler R McDonald and J Nivre 2009 Dependency parsing Morgan ampClaypool US

Omer Kırnap (Koc University) MSc Thesis September 27 2018 7 123

Introduction

Transition Based Dependency Parsing

Transition System Abstract machine with a set of configurations(states) and transitions We use the ArcHybrid transition system[Kuhlmann et al 2011]

Configurations (σ β A)bull σ Stack of tree fragments initially emptybull β Buffer of words initially containing the whole sentencebull A Set of dependency arcs (head relation modifier) initially empty

Transitionsbull shift(σ b|βA) = (σ|b βA)bull leftd(σ|s b|βA) = (σ b|βA cup (b d s))bull rightd(σ|s|t βA) = (σ|s βA cup (s d t))

Omer Kırnap (Koc University) MSc Thesis September 27 2018 8 123

An example parsing of a sentence

Omer Kırnap (Koc University) MSc Thesis September 27 2018 9 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 10 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 11 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 12 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 13 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 14 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 15 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 16 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 17 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 18 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 19 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123

Problem Definition

Find a model learning to decide correct transition from current state

Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123

2 Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123

Related Work

Neural Networks for Feature Conjunctions

Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 8: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Introduction

Transition Based Dependency Parsing

Transition System Abstract machine with a set of configurations(states) and transitions We use the ArcHybrid transition system[Kuhlmann et al 2011]

Configurations (σ β A)bull σ Stack of tree fragments initially emptybull β Buffer of words initially containing the whole sentencebull A Set of dependency arcs (head relation modifier) initially empty

Transitionsbull shift(σ b|βA) = (σ|b βA)bull leftd(σ|s b|βA) = (σ b|βA cup (b d s))bull rightd(σ|s|t βA) = (σ|s βA cup (s d t))

Omer Kırnap (Koc University) MSc Thesis September 27 2018 8 123

An example parsing of a sentence

Omer Kırnap (Koc University) MSc Thesis September 27 2018 9 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 10 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 11 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 12 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 13 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 14 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 15 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 16 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 17 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 18 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 19 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123

Problem Definition

Find a model learning to decide correct transition from current state

Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123

2 Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123

Related Work

Neural Networks for Feature Conjunctions

Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 9: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

An example parsing of a sentence

Omer Kırnap (Koc University) MSc Thesis September 27 2018 9 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 10 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 11 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 12 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 13 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 14 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 15 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 16 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 17 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 18 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 19 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123

Problem Definition

Find a model learning to decide correct transition from current state

Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123

2 Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123

Related Work

Neural Networks for Feature Conjunctions

Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 10: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Omer Kırnap (Koc University) MSc Thesis September 27 2018 10 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 11 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 12 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 13 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 14 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 15 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 16 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 17 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 18 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 19 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123

Problem Definition

Find a model learning to decide correct transition from current state

Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123

2 Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123

Related Work

Neural Networks for Feature Conjunctions

Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 11: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Omer Kırnap (Koc University) MSc Thesis September 27 2018 11 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 12 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 13 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 14 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 15 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 16 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 17 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 18 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 19 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123

Problem Definition

Find a model learning to decide correct transition from current state

Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123

2 Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123

Related Work

Neural Networks for Feature Conjunctions

Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 12: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Omer Kırnap (Koc University) MSc Thesis September 27 2018 12 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 13 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 14 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 15 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 16 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 17 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 18 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 19 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123

Problem Definition

Find a model learning to decide correct transition from current state

Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123

2 Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123

Related Work

Neural Networks for Feature Conjunctions

Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 13: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Omer Kırnap (Koc University) MSc Thesis September 27 2018 13 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 14 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 15 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 16 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 17 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 18 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 19 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123

Problem Definition

Find a model learning to decide correct transition from current state

Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123

2 Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123

Related Work

Neural Networks for Feature Conjunctions

Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 14: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Omer Kırnap (Koc University) MSc Thesis September 27 2018 14 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 15 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 16 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 17 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 18 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 19 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123

Problem Definition

Find a model learning to decide correct transition from current state

Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123

2 Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123

Related Work

Neural Networks for Feature Conjunctions

Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 15: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Omer Kırnap (Koc University) MSc Thesis September 27 2018 15 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 16 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 17 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 18 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 19 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123

Problem Definition

Find a model learning to decide correct transition from current state

Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123

2 Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123

Related Work

Neural Networks for Feature Conjunctions

Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 16: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Omer Kırnap (Koc University) MSc Thesis September 27 2018 16 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 17 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 18 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 19 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123

Problem Definition

Find a model learning to decide correct transition from current state

Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123

2 Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123

Related Work

Neural Networks for Feature Conjunctions

Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 17: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Omer Kırnap (Koc University) MSc Thesis September 27 2018 17 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 18 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 19 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123

Problem Definition

Find a model learning to decide correct transition from current state

Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123

2 Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123

Related Work

Neural Networks for Feature Conjunctions

Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 18: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Omer Kırnap (Koc University) MSc Thesis September 27 2018 18 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 19 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123

Problem Definition

Find a model learning to decide correct transition from current state

Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123

2 Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123

Related Work

Neural Networks for Feature Conjunctions

Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 19: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Omer Kırnap (Koc University) MSc Thesis September 27 2018 19 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123

Problem Definition

Find a model learning to decide correct transition from current state

Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123

2 Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123

Related Work

Neural Networks for Feature Conjunctions

Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 20: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123

Problem Definition

Find a model learning to decide correct transition from current state

Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123

2 Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123

Related Work

Neural Networks for Feature Conjunctions

Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 21: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123

Problem Definition

Find a model learning to decide correct transition from current state

Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123

2 Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123

Related Work

Neural Networks for Feature Conjunctions

Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 22: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123

Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123

Problem Definition

Find a model learning to decide correct transition from current state

Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123

2 Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123

Related Work

Neural Networks for Feature Conjunctions

Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 23: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123

Problem Definition

Find a model learning to decide correct transition from current state

Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123

2 Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123

Related Work

Neural Networks for Feature Conjunctions

Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 24: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Problem Definition

Find a model learning to decide correct transition from current state

Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123

2 Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123

Related Work

Neural Networks for Feature Conjunctions

Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 25: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

2 Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123

Related Work

Neural Networks for Feature Conjunctions

Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 26: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123

Related Work

Neural Networks for Feature Conjunctions

Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 27: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123

Related Work

Neural Networks for Feature Conjunctions

Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 28: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Related Work

Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123

Related Work

Neural Networks for Feature Conjunctions

Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 29: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Related Work

Neural Networks for Feature Conjunctions

Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 30: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Related Work

Solution Using dense embeddings for input features

Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 31: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 32: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

3 Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 33: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and

Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 34: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using ContextEmbeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 35: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

a Language Model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 36: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Language Model (LM)

LM is used to obtain Context and Word embeddings with twocomponents

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 37: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Language Model - Word vectors

Character based LSTM generates word Vectors

Figure Character LSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 38: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Language Model - Context Vectors

Word based BiLSTM generates Context Vectors

Figure Word BiLSTM from Kırnap et al 2017

Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 39: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

b MLP Parser (CoNLL17)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 40: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

MLP Parser

MLP Parser consists of 4 components

Character Based LSTM extracts word vectors

Word Based BiLSTM extracts context vectors

Feature extractor describes current state

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 41: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

MLP Parser - Feature Extraction

Feature extractor describes current state

Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 42: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

MLP Parser - Decision Module

Decision module (MLP) decides the next transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 43: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Experiments amp Dataset (MLP) CoNLL17

CoNLL17 Dataset

Dependency parsing of 81 treebanks in 49 languages

All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 44: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Experiments - Evaluation Metric

Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label

Economic news hadGold Tree LAS 1

SBJATT

Economic news had

PREDOBJPred 1 LAS 0

Economic news had

OBJATTPred 2 LAS (frac12)100

Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 45: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Experiments (MLP)

CoNLL 2017 Results (all treebanks LAS)

Ranked 1st among transition based parsers 5

5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 46: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Contributions in CoNLL17

Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 47: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 48: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Context and Word Embeddings

Relative contributions of part-of-speech (p) word vector (v)context vector (c)

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Context vectors provide independent contribution on top ofPOS tags

Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 49: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677p-fb 747 797 663

p-v-c 793 832 742

Our BiLSTM language model word vectors perform betterthan FB vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 50: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Context and Word embeddings

Feats Hungarian En-ParTUT Latvianp 636 766 559

v 735 759 63

c 722 76 635

v-c 76 79 676

p-c 78 825 706

p-v 766 808 677

p-fb 747 797 663

p-v-c 793 832 742

Both POS tags and context vectors have significantcontributions on top of word vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 51: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Issues with MLP

However

Choosing correct state of parser still remains critical

We are unable to represent whole parsing history with featureextracting

Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 52: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Solution

Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack

Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 53: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Model Overview

2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies

CoNLL17

bull Koc-University team with MLP Parser using Context Embeddings

CoNLL18

bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 54: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

c Tree-stack LSTM Parser (CoNLL18)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 55: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Related Work - Stack LSTM

Figure Stack LSTM [Dyer et al 2015]

Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 56: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Problems with Stack LSTM

They only modify stackrsquos word embeddings

Hidden states of LSTMS are not updated unless reduce

Actions are not explicitly represented

They only used word2vec embeddings [Mikolov et al 2013]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 57: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Our solution

We propose

Context embeddings should improve parsing accuracy

Dependency relations should be explicitly represented

Morphological Features of a word may enhance parsing accuracy

Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 58: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Tree-stack LSTM Overview

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

We propose Tree-stack LSTM model with 4 components

β-LSTMσ-LSTMAction-LSTMTree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 59: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Tree-stack LSTM

Input Representation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 60: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Input Representation

Action and Dependency Relation Embeddings

Every action is represented with continuous vector

Every dependency relation is represented with continuous vector

Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 61: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Input Representation

We do not include explicit feature extractor We initiated wordrepresentation by concatenating

Character Based LSTMrsquos word vectors

Word Based BiLSTMrsquos context vectors

Part-of-speech (POS) vectors

Morph-feat vectors

Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 62: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Input Representation

Morp-feat Vectors

Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It

Figure Morph-feat Embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 63: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Tree-stack LSTM

Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 64: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

β-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 65: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

β-LSTM

LSTM LSTM LSTM

wi+2wi+1wi

Figure Bufferrsquos β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 66: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

σ-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 67: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

σ-LSTM

LSTM LSTM LSTM

si si+1 si+2

Figure Stackrsquos σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 68: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Action-LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 69: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Action-LSTM

LSTM LSTM LSTM

Figure Action-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 70: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

How do components of tree-stack LSTM are connected

Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 71: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Tree-RNN

Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 72: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Tree-RNN (t-RNN)

t-RNN

Dependent word

Dependency Relation

Head word

Figure t-RNN

whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 73: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Tree-RNN with

1 Left Transition2 Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 74: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Left Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 75: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM LSTM

Left transition

t-RNN

Dependency Relation

HeadDependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 76: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 77: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM

LSTM

LSTM

Left transition

t-RNN

Dependency Relation

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 78: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure β-LSTM recalculates its hidden based on new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 79: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Transitions - Left

leftd(σ|s b|βA) = (σ b|βA cup (b d s))

LSTM LSTM LSTM

Left transition

t-RNN New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 80: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Right Transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 81: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings

Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 82: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

Figure Stackrsquos top LSTM is reduced

Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 83: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM

LSTM

LSTM

t-RNN

Dependency Relation

Right Transition

Head

Dependent

New Head

Figure t-RNN calculates new head embedding

Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 84: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure σ-LSTM recalculates its hidden from new input

Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 85: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Transitions - Right

rightd(σ|s|t βA) = (σ|s βA cup (s d t))

LSTM LSTM LSTM

t-RNN

Right Transition

New Head

Figure Tree-stack LSTM is ready to give new transition

Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 86: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Final overview of Tree-stack LSTM

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 87: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Overview

1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing

2 Related WorkLinear Models and their DrawbacksNeural Network Models

3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser

4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning

5 Conclusion6 Future Work amp Discussions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 88: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

4 Results amp Comparisons

Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 89: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Results amp Comparisons

Dataset

Dependency parsing of 81

treebanks in 49 languages

All treebanks use standardized

annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 7th out

of 33 participants (1st among

transition based parsers)

Dependency parsing of 82

treebanks in 57 languages

All treebanks use

standardized annotation

17 universal

part-of-speech tags

37 universal dependency

relations

Koc-University ranked 16th

out of 30 participants (2nd

among transition based

parsers)

CoNLL17 CoNLL181 Traintest split change 2 Annotation

Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 90: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

MLP vs Tree-stack LSTM

CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets

Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 91: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

MLP vs Tree-stack LSTM

2 possible problems of official comparison

1 If the annotation of the tree bank is improved the older parser ishandicapped

2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly

Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 92: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

MLP vs Tree-stack LSTM

Experiments with the same train-test datasets to compare models

Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357

Tree-stack LSTM outperforms MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 93: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Ablation Analysis of Tree-stack LSTM

An evolution from MLP to Tree-stack LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 94: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

MLP Parser

MLP

Figure Initial model

Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 95: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Only Action LSTM

LSTM LSTM

Figure Only action LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 96: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Only β-LSTM

LSTM LSTM LSTM

LSTM LSTM MLP

Figure Only β-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 97: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Only σ-LSTM

LSTM LSTM

LSTM LSTM MLP

Figure Only σ-LSTM

Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 98: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Ablation Analysis Results

Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692

cs cac 8389 8223 8313 8317

en ewt 7554 7543 7556 7567

Table Comparison between MLP and rdquoOnlyrdquo models

Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 99: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Ablation of t-RNN

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 100: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Ablation of t-RNN

Comparison of stack-LSTMs with and without t-RNN

Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875

ar padt (120k) 6804 6814

en ewt (204k) 7487 7577

cs cac (473k) 8289 8357

cs pdt (1M) 8117 81164

t-RNN provides comparative advantage for low-resourcelanguages

Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 101: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Ablation Analysis

Overall results of ablation analysis

Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577

Tree-stack LSTM beats other model variations

Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 102: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Ablation Analysis

Conclusions of Ablation Experiments

t-RNNrsquos performance contribution increases when the training sizedecreases

σ-LSTM provides more useful information independent from datasetsize

Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 103: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

What does Morphological Feature Embedding provide

Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 104: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Contribution of Morph-feat Embeddings

Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions

Languages having less than 20k tokens

Languages having more than 20k less than 50k tokens

Languages having more than 50k less than 100k tokens

Languages having 100k tokens or more

Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 105: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having less than 20k training tokens

Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583

ru taiga 5832 6055 10479

sme giella 5278 5339 16385

la perseus 4993 516 18184

ug udt 5278 5339 19262

sl sst 4672 4877 19473

hu szeged 6623 6818 20166

Not useful for languages having less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 106: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having tokens in between 50k and100k

Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325

fr sequoia 8436 8217 50543

en gum 7644 7534 53686

ko gsd 7374 7254 56687

eu bdt 7455 7332 72974

nl lassymal 767 758 75134

gl ctg 7902 79018 79327

lv lvtb 7233 7224 80666

id gsd 7576 7397 97531

Beneficial for languages with 50k-100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 107: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Contribution of Morph-feat embeddings

Morp-feat experiments for languages having more than 100k trainingtokens

Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064

bg btb 8453 8455 124336

en ewt 7577 75682 204585

ar padt 6802 6814 223881

de gsd 7159 7132 263804

ca ancora 8589 85874 417587

es ancora 8499 8478 444617

cs cac 8357 8363 472608

cs pdt 8143 8212 1173282

Neutral for languages having more than 100k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 108: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Static vs Dynamic Oracle Training

Static oracle transitions using gold movesDynamic oracle transitions using predicted moves

In both cases logp of gold moves maximized

t-RNN

Head word

Dependent word Dependency Relation

LSTM LSTM LSTM LSTM LSTM

LSTM LSTM A

Concat

MLP

Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 109: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens less than 20k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 110: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens in between 20k and 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 111: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Static vs Dynamic Oracle Training

Figure Results are very close for training tokens more than 50k

Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 112: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

How about languages with less than 20k training tokens

Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 113: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Transfer Learning

There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and

use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al

2017]3 Using my own word and context vectors trained with different

language but from the same language family4 Applying transfer learning with a pre-trained parser

Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898

kmr mg 2012 2257 2278 2339

Table LAS values for strategies (1) (2) (3) and (4)

Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 114: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Transfer Learning

Conclusions of Transfer Learning Experiments

Applying transfer learning with a pre-trained parser is the mostbeneficial

From scratch LM training does not bring useful word and contextvectors

Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]

Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 115: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Projectivity

Transition Based Parser can only build projective trees 6

6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf

Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 116: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Projective vs Non-projective

We compared our model with the best model for different projectivityratios

Language Projectiviy Best (LAS) Our (LAS)

grc perseus 907 7939 5503 (20)

eu bdt 9513 8422 7413 (17)

hu szeged 978 8266 6818 (14)

da ddt 9826 8628 7640 (17)

en gum 996 8505 7644 (15)

gl treegal 100 7425 7045 (10)

gl ctg 100 8212 7945 (14)

Table Our models performance gap decreases as the projectivity ratio increases

7

7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 117: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Conclusions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 118: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Conclusion

In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing

Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering

Tree-stack LSTM performed better with low resource languages

When the training dataset size increases tree-stack LSTM losses itsadvantage

Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 119: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Future Research Direction

End-to-End Training

Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings

Attention Mechanism

Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement

Morphological Features

Finding different way to represent morphological features

Dynamic Oracle vs Beam Training

Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem

Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 120: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Publications

Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies

Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies

Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 121: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

References

Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682

S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US

Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075

Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 122: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Thank you for your attention

Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions
Page 123: Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep Learning Omer K rnap Ko˘c University okirnap@ku.edu.tr September 27, 2018 Omer

Questions

Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123

  • Introduction
    • Overview of Dependency Parsing
    • Transition Based Dependency Parsing
      • Related Work
        • Linear Models and their Drawbacks
        • Neural Network Models
          • Model
            • Language Model
            • MLP Parser
            • Tree-stack LSTM Parser
              • Results
                • MLP vs Tree-stack LSTM
                • Morphological Feature Embeddings
                • Static vs Dynamic Oracle Training
                • Transfer Learning
                  • Conclusion
                  • Future Work amp Discussions