Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep...
Transcript of Transition Based Dependency Parsing with Deep LearningTransition Based Dependency Parsing with Deep...
Transition Based Dependency Parsing with DeepLearning
Omer Kırnap
Koc University
okirnapkuedutr
September 27 2018
Omer Kırnap (Koc University) MSc Thesis September 27 2018 1 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 2 123
1 Introduction
Omer Kırnap (Koc University) MSc Thesis September 27 2018 3 123
Introduction
What is dependency parsing
Dependency parsing aims to detect word relations by finding the treestructure of a sentence inspired by dependency grammar
Figure Dependency annotations for a sentence ldquo Economic news had little effecton financial marketsrdquo
1
1Figure from S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Omer Kırnap (Koc University) MSc Thesis September 27 2018 4 123
Introduction
Why do we need dependency parsing
Dependencies resolve ambiguity
Useful for some down-stream tasks in NLP
2
2Figure from httpwwwphontroncomslidesnlp-programming-en-11-dependpdfOmer Kırnap (Koc University) MSc Thesis September 27 2018 5 123
Introduction
Dependency Parsing Categorization
Grammar BasedRelying on a formal grammardefining a formal languageasking whether a given inputsentence is in the languagedefined by the grammar or not
Data-drivenMaking essential use of machinelearning from linguistic data in orderto parse new sentences
3
3From S Kbler R McDonald and J Nivre 2009 Dependency parsing Morgan ampClaypool US
Omer Kırnap (Koc University) MSc Thesis September 27 2018 6 123
Introduction
Data-driven Dependency Parsing
Graph Based Algorithms
Using maximum spanning tree algorithms from graph theory
Transition Based Algorithms
Capitalizing on greedy stack based algorithms to build dependency treewith incremental steps in linear time
4
4From S Kbler R McDonald and J Nivre 2009 Dependency parsing Morgan ampClaypool US
Omer Kırnap (Koc University) MSc Thesis September 27 2018 7 123
Introduction
Transition Based Dependency Parsing
Transition System Abstract machine with a set of configurations(states) and transitions We use the ArcHybrid transition system[Kuhlmann et al 2011]
Configurations (σ β A)bull σ Stack of tree fragments initially emptybull β Buffer of words initially containing the whole sentencebull A Set of dependency arcs (head relation modifier) initially empty
Transitionsbull shift(σ b|βA) = (σ|b βA)bull leftd(σ|s b|βA) = (σ b|βA cup (b d s))bull rightd(σ|s|t βA) = (σ|s βA cup (s d t))
Omer Kırnap (Koc University) MSc Thesis September 27 2018 8 123
An example parsing of a sentence
Omer Kırnap (Koc University) MSc Thesis September 27 2018 9 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 10 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 11 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 12 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 13 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 14 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 15 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 16 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 17 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 18 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 19 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123
Problem Definition
Find a model learning to decide correct transition from current state
Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123
2 Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123
Related Work
Neural Networks for Feature Conjunctions
Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123
Related Work
Solution Using dense embeddings for input features
Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123
3 Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and
Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using ContextEmbeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123
a Language Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123
Language Model (LM)
LM is used to obtain Context and Word embeddings with twocomponents
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123
Language Model - Word vectors
Character based LSTM generates word Vectors
Figure Character LSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123
Language Model - Context Vectors
Word based BiLSTM generates Context Vectors
Figure Word BiLSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123
b MLP Parser (CoNLL17)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123
MLP Parser
MLP Parser consists of 4 components
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Feature extractor describes current state
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123
MLP Parser - Feature Extraction
Feature extractor describes current state
Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123
MLP Parser - Decision Module
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123
Experiments amp Dataset (MLP) CoNLL17
CoNLL17 Dataset
Dependency parsing of 81 treebanks in 49 languages
All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123
Experiments - Evaluation Metric
Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label
Economic news hadGold Tree LAS 1
SBJATT
Economic news had
PREDOBJPred 1 LAS 0
Economic news had
OBJATTPred 2 LAS (frac12)100
Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123
Experiments (MLP)
CoNLL 2017 Results (all treebanks LAS)
Ranked 1st among transition based parsers 5
5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123
Contributions in CoNLL17
Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Context vectors provide independent contribution on top ofPOS tags
Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677p-fb 747 797 663
p-v-c 793 832 742
Our BiLSTM language model word vectors perform betterthan FB vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Both POS tags and context vectors have significantcontributions on top of word vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123
Issues with MLP
However
Choosing correct state of parser still remains critical
We are unable to represent whole parsing history with featureextracting
Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123
Solution
Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack
Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123
c Tree-stack LSTM Parser (CoNLL18)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123
Related Work - Stack LSTM
Figure Stack LSTM [Dyer et al 2015]
Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123
Problems with Stack LSTM
They only modify stackrsquos word embeddings
Hidden states of LSTMS are not updated unless reduce
Actions are not explicitly represented
They only used word2vec embeddings [Mikolov et al 2013]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123
Our solution
We propose
Context embeddings should improve parsing accuracy
Dependency relations should be explicitly represented
Morphological Features of a word may enhance parsing accuracy
Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123
Tree-stack LSTM Overview
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
We propose Tree-stack LSTM model with 4 components
β-LSTMσ-LSTMAction-LSTMTree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123
Tree-stack LSTM
Input Representation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 2 123
1 Introduction
Omer Kırnap (Koc University) MSc Thesis September 27 2018 3 123
Introduction
What is dependency parsing
Dependency parsing aims to detect word relations by finding the treestructure of a sentence inspired by dependency grammar
Figure Dependency annotations for a sentence ldquo Economic news had little effecton financial marketsrdquo
1
1Figure from S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Omer Kırnap (Koc University) MSc Thesis September 27 2018 4 123
Introduction
Why do we need dependency parsing
Dependencies resolve ambiguity
Useful for some down-stream tasks in NLP
2
2Figure from httpwwwphontroncomslidesnlp-programming-en-11-dependpdfOmer Kırnap (Koc University) MSc Thesis September 27 2018 5 123
Introduction
Dependency Parsing Categorization
Grammar BasedRelying on a formal grammardefining a formal languageasking whether a given inputsentence is in the languagedefined by the grammar or not
Data-drivenMaking essential use of machinelearning from linguistic data in orderto parse new sentences
3
3From S Kbler R McDonald and J Nivre 2009 Dependency parsing Morgan ampClaypool US
Omer Kırnap (Koc University) MSc Thesis September 27 2018 6 123
Introduction
Data-driven Dependency Parsing
Graph Based Algorithms
Using maximum spanning tree algorithms from graph theory
Transition Based Algorithms
Capitalizing on greedy stack based algorithms to build dependency treewith incremental steps in linear time
4
4From S Kbler R McDonald and J Nivre 2009 Dependency parsing Morgan ampClaypool US
Omer Kırnap (Koc University) MSc Thesis September 27 2018 7 123
Introduction
Transition Based Dependency Parsing
Transition System Abstract machine with a set of configurations(states) and transitions We use the ArcHybrid transition system[Kuhlmann et al 2011]
Configurations (σ β A)bull σ Stack of tree fragments initially emptybull β Buffer of words initially containing the whole sentencebull A Set of dependency arcs (head relation modifier) initially empty
Transitionsbull shift(σ b|βA) = (σ|b βA)bull leftd(σ|s b|βA) = (σ b|βA cup (b d s))bull rightd(σ|s|t βA) = (σ|s βA cup (s d t))
Omer Kırnap (Koc University) MSc Thesis September 27 2018 8 123
An example parsing of a sentence
Omer Kırnap (Koc University) MSc Thesis September 27 2018 9 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 10 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 11 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 12 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 13 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 14 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 15 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 16 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 17 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 18 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 19 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123
Problem Definition
Find a model learning to decide correct transition from current state
Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123
2 Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123
Related Work
Neural Networks for Feature Conjunctions
Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123
Related Work
Solution Using dense embeddings for input features
Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123
3 Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and
Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using ContextEmbeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123
a Language Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123
Language Model (LM)
LM is used to obtain Context and Word embeddings with twocomponents
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123
Language Model - Word vectors
Character based LSTM generates word Vectors
Figure Character LSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123
Language Model - Context Vectors
Word based BiLSTM generates Context Vectors
Figure Word BiLSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123
b MLP Parser (CoNLL17)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123
MLP Parser
MLP Parser consists of 4 components
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Feature extractor describes current state
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123
MLP Parser - Feature Extraction
Feature extractor describes current state
Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123
MLP Parser - Decision Module
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123
Experiments amp Dataset (MLP) CoNLL17
CoNLL17 Dataset
Dependency parsing of 81 treebanks in 49 languages
All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123
Experiments - Evaluation Metric
Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label
Economic news hadGold Tree LAS 1
SBJATT
Economic news had
PREDOBJPred 1 LAS 0
Economic news had
OBJATTPred 2 LAS (frac12)100
Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123
Experiments (MLP)
CoNLL 2017 Results (all treebanks LAS)
Ranked 1st among transition based parsers 5
5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123
Contributions in CoNLL17
Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Context vectors provide independent contribution on top ofPOS tags
Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677p-fb 747 797 663
p-v-c 793 832 742
Our BiLSTM language model word vectors perform betterthan FB vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Both POS tags and context vectors have significantcontributions on top of word vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123
Issues with MLP
However
Choosing correct state of parser still remains critical
We are unable to represent whole parsing history with featureextracting
Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123
Solution
Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack
Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123
c Tree-stack LSTM Parser (CoNLL18)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123
Related Work - Stack LSTM
Figure Stack LSTM [Dyer et al 2015]
Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123
Problems with Stack LSTM
They only modify stackrsquos word embeddings
Hidden states of LSTMS are not updated unless reduce
Actions are not explicitly represented
They only used word2vec embeddings [Mikolov et al 2013]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123
Our solution
We propose
Context embeddings should improve parsing accuracy
Dependency relations should be explicitly represented
Morphological Features of a word may enhance parsing accuracy
Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123
Tree-stack LSTM Overview
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
We propose Tree-stack LSTM model with 4 components
β-LSTMσ-LSTMAction-LSTMTree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123
Tree-stack LSTM
Input Representation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
1 Introduction
Omer Kırnap (Koc University) MSc Thesis September 27 2018 3 123
Introduction
What is dependency parsing
Dependency parsing aims to detect word relations by finding the treestructure of a sentence inspired by dependency grammar
Figure Dependency annotations for a sentence ldquo Economic news had little effecton financial marketsrdquo
1
1Figure from S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Omer Kırnap (Koc University) MSc Thesis September 27 2018 4 123
Introduction
Why do we need dependency parsing
Dependencies resolve ambiguity
Useful for some down-stream tasks in NLP
2
2Figure from httpwwwphontroncomslidesnlp-programming-en-11-dependpdfOmer Kırnap (Koc University) MSc Thesis September 27 2018 5 123
Introduction
Dependency Parsing Categorization
Grammar BasedRelying on a formal grammardefining a formal languageasking whether a given inputsentence is in the languagedefined by the grammar or not
Data-drivenMaking essential use of machinelearning from linguistic data in orderto parse new sentences
3
3From S Kbler R McDonald and J Nivre 2009 Dependency parsing Morgan ampClaypool US
Omer Kırnap (Koc University) MSc Thesis September 27 2018 6 123
Introduction
Data-driven Dependency Parsing
Graph Based Algorithms
Using maximum spanning tree algorithms from graph theory
Transition Based Algorithms
Capitalizing on greedy stack based algorithms to build dependency treewith incremental steps in linear time
4
4From S Kbler R McDonald and J Nivre 2009 Dependency parsing Morgan ampClaypool US
Omer Kırnap (Koc University) MSc Thesis September 27 2018 7 123
Introduction
Transition Based Dependency Parsing
Transition System Abstract machine with a set of configurations(states) and transitions We use the ArcHybrid transition system[Kuhlmann et al 2011]
Configurations (σ β A)bull σ Stack of tree fragments initially emptybull β Buffer of words initially containing the whole sentencebull A Set of dependency arcs (head relation modifier) initially empty
Transitionsbull shift(σ b|βA) = (σ|b βA)bull leftd(σ|s b|βA) = (σ b|βA cup (b d s))bull rightd(σ|s|t βA) = (σ|s βA cup (s d t))
Omer Kırnap (Koc University) MSc Thesis September 27 2018 8 123
An example parsing of a sentence
Omer Kırnap (Koc University) MSc Thesis September 27 2018 9 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 10 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 11 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 12 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 13 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 14 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 15 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 16 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 17 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 18 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 19 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123
Problem Definition
Find a model learning to decide correct transition from current state
Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123
2 Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123
Related Work
Neural Networks for Feature Conjunctions
Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123
Related Work
Solution Using dense embeddings for input features
Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123
3 Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and
Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using ContextEmbeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123
a Language Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123
Language Model (LM)
LM is used to obtain Context and Word embeddings with twocomponents
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123
Language Model - Word vectors
Character based LSTM generates word Vectors
Figure Character LSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123
Language Model - Context Vectors
Word based BiLSTM generates Context Vectors
Figure Word BiLSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123
b MLP Parser (CoNLL17)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123
MLP Parser
MLP Parser consists of 4 components
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Feature extractor describes current state
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123
MLP Parser - Feature Extraction
Feature extractor describes current state
Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123
MLP Parser - Decision Module
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123
Experiments amp Dataset (MLP) CoNLL17
CoNLL17 Dataset
Dependency parsing of 81 treebanks in 49 languages
All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123
Experiments - Evaluation Metric
Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label
Economic news hadGold Tree LAS 1
SBJATT
Economic news had
PREDOBJPred 1 LAS 0
Economic news had
OBJATTPred 2 LAS (frac12)100
Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123
Experiments (MLP)
CoNLL 2017 Results (all treebanks LAS)
Ranked 1st among transition based parsers 5
5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123
Contributions in CoNLL17
Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Context vectors provide independent contribution on top ofPOS tags
Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677p-fb 747 797 663
p-v-c 793 832 742
Our BiLSTM language model word vectors perform betterthan FB vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Both POS tags and context vectors have significantcontributions on top of word vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123
Issues with MLP
However
Choosing correct state of parser still remains critical
We are unable to represent whole parsing history with featureextracting
Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123
Solution
Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack
Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123
c Tree-stack LSTM Parser (CoNLL18)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123
Related Work - Stack LSTM
Figure Stack LSTM [Dyer et al 2015]
Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123
Problems with Stack LSTM
They only modify stackrsquos word embeddings
Hidden states of LSTMS are not updated unless reduce
Actions are not explicitly represented
They only used word2vec embeddings [Mikolov et al 2013]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123
Our solution
We propose
Context embeddings should improve parsing accuracy
Dependency relations should be explicitly represented
Morphological Features of a word may enhance parsing accuracy
Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123
Tree-stack LSTM Overview
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
We propose Tree-stack LSTM model with 4 components
β-LSTMσ-LSTMAction-LSTMTree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123
Tree-stack LSTM
Input Representation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Introduction
What is dependency parsing
Dependency parsing aims to detect word relations by finding the treestructure of a sentence inspired by dependency grammar
Figure Dependency annotations for a sentence ldquo Economic news had little effecton financial marketsrdquo
1
1Figure from S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Omer Kırnap (Koc University) MSc Thesis September 27 2018 4 123
Introduction
Why do we need dependency parsing
Dependencies resolve ambiguity
Useful for some down-stream tasks in NLP
2
2Figure from httpwwwphontroncomslidesnlp-programming-en-11-dependpdfOmer Kırnap (Koc University) MSc Thesis September 27 2018 5 123
Introduction
Dependency Parsing Categorization
Grammar BasedRelying on a formal grammardefining a formal languageasking whether a given inputsentence is in the languagedefined by the grammar or not
Data-drivenMaking essential use of machinelearning from linguistic data in orderto parse new sentences
3
3From S Kbler R McDonald and J Nivre 2009 Dependency parsing Morgan ampClaypool US
Omer Kırnap (Koc University) MSc Thesis September 27 2018 6 123
Introduction
Data-driven Dependency Parsing
Graph Based Algorithms
Using maximum spanning tree algorithms from graph theory
Transition Based Algorithms
Capitalizing on greedy stack based algorithms to build dependency treewith incremental steps in linear time
4
4From S Kbler R McDonald and J Nivre 2009 Dependency parsing Morgan ampClaypool US
Omer Kırnap (Koc University) MSc Thesis September 27 2018 7 123
Introduction
Transition Based Dependency Parsing
Transition System Abstract machine with a set of configurations(states) and transitions We use the ArcHybrid transition system[Kuhlmann et al 2011]
Configurations (σ β A)bull σ Stack of tree fragments initially emptybull β Buffer of words initially containing the whole sentencebull A Set of dependency arcs (head relation modifier) initially empty
Transitionsbull shift(σ b|βA) = (σ|b βA)bull leftd(σ|s b|βA) = (σ b|βA cup (b d s))bull rightd(σ|s|t βA) = (σ|s βA cup (s d t))
Omer Kırnap (Koc University) MSc Thesis September 27 2018 8 123
An example parsing of a sentence
Omer Kırnap (Koc University) MSc Thesis September 27 2018 9 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 10 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 11 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 12 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 13 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 14 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 15 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 16 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 17 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 18 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 19 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123
Problem Definition
Find a model learning to decide correct transition from current state
Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123
2 Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123
Related Work
Neural Networks for Feature Conjunctions
Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123
Related Work
Solution Using dense embeddings for input features
Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123
3 Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and
Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using ContextEmbeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123
a Language Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123
Language Model (LM)
LM is used to obtain Context and Word embeddings with twocomponents
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123
Language Model - Word vectors
Character based LSTM generates word Vectors
Figure Character LSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123
Language Model - Context Vectors
Word based BiLSTM generates Context Vectors
Figure Word BiLSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123
b MLP Parser (CoNLL17)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123
MLP Parser
MLP Parser consists of 4 components
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Feature extractor describes current state
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123
MLP Parser - Feature Extraction
Feature extractor describes current state
Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123
MLP Parser - Decision Module
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123
Experiments amp Dataset (MLP) CoNLL17
CoNLL17 Dataset
Dependency parsing of 81 treebanks in 49 languages
All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123
Experiments - Evaluation Metric
Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label
Economic news hadGold Tree LAS 1
SBJATT
Economic news had
PREDOBJPred 1 LAS 0
Economic news had
OBJATTPred 2 LAS (frac12)100
Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123
Experiments (MLP)
CoNLL 2017 Results (all treebanks LAS)
Ranked 1st among transition based parsers 5
5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123
Contributions in CoNLL17
Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Context vectors provide independent contribution on top ofPOS tags
Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677p-fb 747 797 663
p-v-c 793 832 742
Our BiLSTM language model word vectors perform betterthan FB vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Both POS tags and context vectors have significantcontributions on top of word vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123
Issues with MLP
However
Choosing correct state of parser still remains critical
We are unable to represent whole parsing history with featureextracting
Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123
Solution
Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack
Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123
c Tree-stack LSTM Parser (CoNLL18)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123
Related Work - Stack LSTM
Figure Stack LSTM [Dyer et al 2015]
Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123
Problems with Stack LSTM
They only modify stackrsquos word embeddings
Hidden states of LSTMS are not updated unless reduce
Actions are not explicitly represented
They only used word2vec embeddings [Mikolov et al 2013]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123
Our solution
We propose
Context embeddings should improve parsing accuracy
Dependency relations should be explicitly represented
Morphological Features of a word may enhance parsing accuracy
Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123
Tree-stack LSTM Overview
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
We propose Tree-stack LSTM model with 4 components
β-LSTMσ-LSTMAction-LSTMTree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123
Tree-stack LSTM
Input Representation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Introduction
Why do we need dependency parsing
Dependencies resolve ambiguity
Useful for some down-stream tasks in NLP
2
2Figure from httpwwwphontroncomslidesnlp-programming-en-11-dependpdfOmer Kırnap (Koc University) MSc Thesis September 27 2018 5 123
Introduction
Dependency Parsing Categorization
Grammar BasedRelying on a formal grammardefining a formal languageasking whether a given inputsentence is in the languagedefined by the grammar or not
Data-drivenMaking essential use of machinelearning from linguistic data in orderto parse new sentences
3
3From S Kbler R McDonald and J Nivre 2009 Dependency parsing Morgan ampClaypool US
Omer Kırnap (Koc University) MSc Thesis September 27 2018 6 123
Introduction
Data-driven Dependency Parsing
Graph Based Algorithms
Using maximum spanning tree algorithms from graph theory
Transition Based Algorithms
Capitalizing on greedy stack based algorithms to build dependency treewith incremental steps in linear time
4
4From S Kbler R McDonald and J Nivre 2009 Dependency parsing Morgan ampClaypool US
Omer Kırnap (Koc University) MSc Thesis September 27 2018 7 123
Introduction
Transition Based Dependency Parsing
Transition System Abstract machine with a set of configurations(states) and transitions We use the ArcHybrid transition system[Kuhlmann et al 2011]
Configurations (σ β A)bull σ Stack of tree fragments initially emptybull β Buffer of words initially containing the whole sentencebull A Set of dependency arcs (head relation modifier) initially empty
Transitionsbull shift(σ b|βA) = (σ|b βA)bull leftd(σ|s b|βA) = (σ b|βA cup (b d s))bull rightd(σ|s|t βA) = (σ|s βA cup (s d t))
Omer Kırnap (Koc University) MSc Thesis September 27 2018 8 123
An example parsing of a sentence
Omer Kırnap (Koc University) MSc Thesis September 27 2018 9 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 10 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 11 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 12 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 13 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 14 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 15 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 16 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 17 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 18 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 19 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123
Problem Definition
Find a model learning to decide correct transition from current state
Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123
2 Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123
Related Work
Neural Networks for Feature Conjunctions
Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123
Related Work
Solution Using dense embeddings for input features
Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123
3 Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and
Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using ContextEmbeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123
a Language Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123
Language Model (LM)
LM is used to obtain Context and Word embeddings with twocomponents
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123
Language Model - Word vectors
Character based LSTM generates word Vectors
Figure Character LSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123
Language Model - Context Vectors
Word based BiLSTM generates Context Vectors
Figure Word BiLSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123
b MLP Parser (CoNLL17)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123
MLP Parser
MLP Parser consists of 4 components
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Feature extractor describes current state
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123
MLP Parser - Feature Extraction
Feature extractor describes current state
Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123
MLP Parser - Decision Module
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123
Experiments amp Dataset (MLP) CoNLL17
CoNLL17 Dataset
Dependency parsing of 81 treebanks in 49 languages
All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123
Experiments - Evaluation Metric
Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label
Economic news hadGold Tree LAS 1
SBJATT
Economic news had
PREDOBJPred 1 LAS 0
Economic news had
OBJATTPred 2 LAS (frac12)100
Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123
Experiments (MLP)
CoNLL 2017 Results (all treebanks LAS)
Ranked 1st among transition based parsers 5
5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123
Contributions in CoNLL17
Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Context vectors provide independent contribution on top ofPOS tags
Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677p-fb 747 797 663
p-v-c 793 832 742
Our BiLSTM language model word vectors perform betterthan FB vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Both POS tags and context vectors have significantcontributions on top of word vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123
Issues with MLP
However
Choosing correct state of parser still remains critical
We are unable to represent whole parsing history with featureextracting
Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123
Solution
Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack
Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123
c Tree-stack LSTM Parser (CoNLL18)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123
Related Work - Stack LSTM
Figure Stack LSTM [Dyer et al 2015]
Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123
Problems with Stack LSTM
They only modify stackrsquos word embeddings
Hidden states of LSTMS are not updated unless reduce
Actions are not explicitly represented
They only used word2vec embeddings [Mikolov et al 2013]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123
Our solution
We propose
Context embeddings should improve parsing accuracy
Dependency relations should be explicitly represented
Morphological Features of a word may enhance parsing accuracy
Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123
Tree-stack LSTM Overview
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
We propose Tree-stack LSTM model with 4 components
β-LSTMσ-LSTMAction-LSTMTree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123
Tree-stack LSTM
Input Representation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Introduction
Dependency Parsing Categorization
Grammar BasedRelying on a formal grammardefining a formal languageasking whether a given inputsentence is in the languagedefined by the grammar or not
Data-drivenMaking essential use of machinelearning from linguistic data in orderto parse new sentences
3
3From S Kbler R McDonald and J Nivre 2009 Dependency parsing Morgan ampClaypool US
Omer Kırnap (Koc University) MSc Thesis September 27 2018 6 123
Introduction
Data-driven Dependency Parsing
Graph Based Algorithms
Using maximum spanning tree algorithms from graph theory
Transition Based Algorithms
Capitalizing on greedy stack based algorithms to build dependency treewith incremental steps in linear time
4
4From S Kbler R McDonald and J Nivre 2009 Dependency parsing Morgan ampClaypool US
Omer Kırnap (Koc University) MSc Thesis September 27 2018 7 123
Introduction
Transition Based Dependency Parsing
Transition System Abstract machine with a set of configurations(states) and transitions We use the ArcHybrid transition system[Kuhlmann et al 2011]
Configurations (σ β A)bull σ Stack of tree fragments initially emptybull β Buffer of words initially containing the whole sentencebull A Set of dependency arcs (head relation modifier) initially empty
Transitionsbull shift(σ b|βA) = (σ|b βA)bull leftd(σ|s b|βA) = (σ b|βA cup (b d s))bull rightd(σ|s|t βA) = (σ|s βA cup (s d t))
Omer Kırnap (Koc University) MSc Thesis September 27 2018 8 123
An example parsing of a sentence
Omer Kırnap (Koc University) MSc Thesis September 27 2018 9 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 10 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 11 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 12 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 13 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 14 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 15 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 16 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 17 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 18 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 19 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123
Problem Definition
Find a model learning to decide correct transition from current state
Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123
2 Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123
Related Work
Neural Networks for Feature Conjunctions
Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123
Related Work
Solution Using dense embeddings for input features
Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123
3 Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and
Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using ContextEmbeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123
a Language Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123
Language Model (LM)
LM is used to obtain Context and Word embeddings with twocomponents
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123
Language Model - Word vectors
Character based LSTM generates word Vectors
Figure Character LSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123
Language Model - Context Vectors
Word based BiLSTM generates Context Vectors
Figure Word BiLSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123
b MLP Parser (CoNLL17)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123
MLP Parser
MLP Parser consists of 4 components
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Feature extractor describes current state
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123
MLP Parser - Feature Extraction
Feature extractor describes current state
Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123
MLP Parser - Decision Module
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123
Experiments amp Dataset (MLP) CoNLL17
CoNLL17 Dataset
Dependency parsing of 81 treebanks in 49 languages
All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123
Experiments - Evaluation Metric
Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label
Economic news hadGold Tree LAS 1
SBJATT
Economic news had
PREDOBJPred 1 LAS 0
Economic news had
OBJATTPred 2 LAS (frac12)100
Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123
Experiments (MLP)
CoNLL 2017 Results (all treebanks LAS)
Ranked 1st among transition based parsers 5
5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123
Contributions in CoNLL17
Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Context vectors provide independent contribution on top ofPOS tags
Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677p-fb 747 797 663
p-v-c 793 832 742
Our BiLSTM language model word vectors perform betterthan FB vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Both POS tags and context vectors have significantcontributions on top of word vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123
Issues with MLP
However
Choosing correct state of parser still remains critical
We are unable to represent whole parsing history with featureextracting
Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123
Solution
Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack
Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123
c Tree-stack LSTM Parser (CoNLL18)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123
Related Work - Stack LSTM
Figure Stack LSTM [Dyer et al 2015]
Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123
Problems with Stack LSTM
They only modify stackrsquos word embeddings
Hidden states of LSTMS are not updated unless reduce
Actions are not explicitly represented
They only used word2vec embeddings [Mikolov et al 2013]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123
Our solution
We propose
Context embeddings should improve parsing accuracy
Dependency relations should be explicitly represented
Morphological Features of a word may enhance parsing accuracy
Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123
Tree-stack LSTM Overview
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
We propose Tree-stack LSTM model with 4 components
β-LSTMσ-LSTMAction-LSTMTree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123
Tree-stack LSTM
Input Representation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Introduction
Data-driven Dependency Parsing
Graph Based Algorithms
Using maximum spanning tree algorithms from graph theory
Transition Based Algorithms
Capitalizing on greedy stack based algorithms to build dependency treewith incremental steps in linear time
4
4From S Kbler R McDonald and J Nivre 2009 Dependency parsing Morgan ampClaypool US
Omer Kırnap (Koc University) MSc Thesis September 27 2018 7 123
Introduction
Transition Based Dependency Parsing
Transition System Abstract machine with a set of configurations(states) and transitions We use the ArcHybrid transition system[Kuhlmann et al 2011]
Configurations (σ β A)bull σ Stack of tree fragments initially emptybull β Buffer of words initially containing the whole sentencebull A Set of dependency arcs (head relation modifier) initially empty
Transitionsbull shift(σ b|βA) = (σ|b βA)bull leftd(σ|s b|βA) = (σ b|βA cup (b d s))bull rightd(σ|s|t βA) = (σ|s βA cup (s d t))
Omer Kırnap (Koc University) MSc Thesis September 27 2018 8 123
An example parsing of a sentence
Omer Kırnap (Koc University) MSc Thesis September 27 2018 9 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 10 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 11 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 12 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 13 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 14 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 15 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 16 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 17 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 18 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 19 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123
Problem Definition
Find a model learning to decide correct transition from current state
Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123
2 Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123
Related Work
Neural Networks for Feature Conjunctions
Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123
Related Work
Solution Using dense embeddings for input features
Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123
3 Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and
Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using ContextEmbeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123
a Language Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123
Language Model (LM)
LM is used to obtain Context and Word embeddings with twocomponents
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123
Language Model - Word vectors
Character based LSTM generates word Vectors
Figure Character LSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123
Language Model - Context Vectors
Word based BiLSTM generates Context Vectors
Figure Word BiLSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123
b MLP Parser (CoNLL17)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123
MLP Parser
MLP Parser consists of 4 components
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Feature extractor describes current state
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123
MLP Parser - Feature Extraction
Feature extractor describes current state
Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123
MLP Parser - Decision Module
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123
Experiments amp Dataset (MLP) CoNLL17
CoNLL17 Dataset
Dependency parsing of 81 treebanks in 49 languages
All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123
Experiments - Evaluation Metric
Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label
Economic news hadGold Tree LAS 1
SBJATT
Economic news had
PREDOBJPred 1 LAS 0
Economic news had
OBJATTPred 2 LAS (frac12)100
Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123
Experiments (MLP)
CoNLL 2017 Results (all treebanks LAS)
Ranked 1st among transition based parsers 5
5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123
Contributions in CoNLL17
Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Context vectors provide independent contribution on top ofPOS tags
Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677p-fb 747 797 663
p-v-c 793 832 742
Our BiLSTM language model word vectors perform betterthan FB vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Both POS tags and context vectors have significantcontributions on top of word vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123
Issues with MLP
However
Choosing correct state of parser still remains critical
We are unable to represent whole parsing history with featureextracting
Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123
Solution
Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack
Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123
c Tree-stack LSTM Parser (CoNLL18)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123
Related Work - Stack LSTM
Figure Stack LSTM [Dyer et al 2015]
Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123
Problems with Stack LSTM
They only modify stackrsquos word embeddings
Hidden states of LSTMS are not updated unless reduce
Actions are not explicitly represented
They only used word2vec embeddings [Mikolov et al 2013]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123
Our solution
We propose
Context embeddings should improve parsing accuracy
Dependency relations should be explicitly represented
Morphological Features of a word may enhance parsing accuracy
Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123
Tree-stack LSTM Overview
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
We propose Tree-stack LSTM model with 4 components
β-LSTMσ-LSTMAction-LSTMTree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123
Tree-stack LSTM
Input Representation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Introduction
Transition Based Dependency Parsing
Transition System Abstract machine with a set of configurations(states) and transitions We use the ArcHybrid transition system[Kuhlmann et al 2011]
Configurations (σ β A)bull σ Stack of tree fragments initially emptybull β Buffer of words initially containing the whole sentencebull A Set of dependency arcs (head relation modifier) initially empty
Transitionsbull shift(σ b|βA) = (σ|b βA)bull leftd(σ|s b|βA) = (σ b|βA cup (b d s))bull rightd(σ|s|t βA) = (σ|s βA cup (s d t))
Omer Kırnap (Koc University) MSc Thesis September 27 2018 8 123
An example parsing of a sentence
Omer Kırnap (Koc University) MSc Thesis September 27 2018 9 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 10 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 11 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 12 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 13 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 14 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 15 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 16 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 17 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 18 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 19 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123
Problem Definition
Find a model learning to decide correct transition from current state
Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123
2 Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123
Related Work
Neural Networks for Feature Conjunctions
Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123
Related Work
Solution Using dense embeddings for input features
Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123
3 Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and
Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using ContextEmbeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123
a Language Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123
Language Model (LM)
LM is used to obtain Context and Word embeddings with twocomponents
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123
Language Model - Word vectors
Character based LSTM generates word Vectors
Figure Character LSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123
Language Model - Context Vectors
Word based BiLSTM generates Context Vectors
Figure Word BiLSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123
b MLP Parser (CoNLL17)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123
MLP Parser
MLP Parser consists of 4 components
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Feature extractor describes current state
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123
MLP Parser - Feature Extraction
Feature extractor describes current state
Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123
MLP Parser - Decision Module
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123
Experiments amp Dataset (MLP) CoNLL17
CoNLL17 Dataset
Dependency parsing of 81 treebanks in 49 languages
All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123
Experiments - Evaluation Metric
Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label
Economic news hadGold Tree LAS 1
SBJATT
Economic news had
PREDOBJPred 1 LAS 0
Economic news had
OBJATTPred 2 LAS (frac12)100
Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123
Experiments (MLP)
CoNLL 2017 Results (all treebanks LAS)
Ranked 1st among transition based parsers 5
5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123
Contributions in CoNLL17
Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Context vectors provide independent contribution on top ofPOS tags
Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677p-fb 747 797 663
p-v-c 793 832 742
Our BiLSTM language model word vectors perform betterthan FB vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Both POS tags and context vectors have significantcontributions on top of word vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123
Issues with MLP
However
Choosing correct state of parser still remains critical
We are unable to represent whole parsing history with featureextracting
Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123
Solution
Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack
Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123
c Tree-stack LSTM Parser (CoNLL18)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123
Related Work - Stack LSTM
Figure Stack LSTM [Dyer et al 2015]
Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123
Problems with Stack LSTM
They only modify stackrsquos word embeddings
Hidden states of LSTMS are not updated unless reduce
Actions are not explicitly represented
They only used word2vec embeddings [Mikolov et al 2013]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123
Our solution
We propose
Context embeddings should improve parsing accuracy
Dependency relations should be explicitly represented
Morphological Features of a word may enhance parsing accuracy
Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123
Tree-stack LSTM Overview
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
We propose Tree-stack LSTM model with 4 components
β-LSTMσ-LSTMAction-LSTMTree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123
Tree-stack LSTM
Input Representation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
An example parsing of a sentence
Omer Kırnap (Koc University) MSc Thesis September 27 2018 9 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 10 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 11 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 12 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 13 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 14 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 15 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 16 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 17 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 18 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 19 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123
Problem Definition
Find a model learning to decide correct transition from current state
Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123
2 Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123
Related Work
Neural Networks for Feature Conjunctions
Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123
Related Work
Solution Using dense embeddings for input features
Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123
3 Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and
Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using ContextEmbeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123
a Language Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123
Language Model (LM)
LM is used to obtain Context and Word embeddings with twocomponents
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123
Language Model - Word vectors
Character based LSTM generates word Vectors
Figure Character LSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123
Language Model - Context Vectors
Word based BiLSTM generates Context Vectors
Figure Word BiLSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123
b MLP Parser (CoNLL17)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123
MLP Parser
MLP Parser consists of 4 components
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Feature extractor describes current state
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123
MLP Parser - Feature Extraction
Feature extractor describes current state
Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123
MLP Parser - Decision Module
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123
Experiments amp Dataset (MLP) CoNLL17
CoNLL17 Dataset
Dependency parsing of 81 treebanks in 49 languages
All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123
Experiments - Evaluation Metric
Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label
Economic news hadGold Tree LAS 1
SBJATT
Economic news had
PREDOBJPred 1 LAS 0
Economic news had
OBJATTPred 2 LAS (frac12)100
Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123
Experiments (MLP)
CoNLL 2017 Results (all treebanks LAS)
Ranked 1st among transition based parsers 5
5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123
Contributions in CoNLL17
Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Context vectors provide independent contribution on top ofPOS tags
Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677p-fb 747 797 663
p-v-c 793 832 742
Our BiLSTM language model word vectors perform betterthan FB vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Both POS tags and context vectors have significantcontributions on top of word vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123
Issues with MLP
However
Choosing correct state of parser still remains critical
We are unable to represent whole parsing history with featureextracting
Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123
Solution
Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack
Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123
c Tree-stack LSTM Parser (CoNLL18)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123
Related Work - Stack LSTM
Figure Stack LSTM [Dyer et al 2015]
Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123
Problems with Stack LSTM
They only modify stackrsquos word embeddings
Hidden states of LSTMS are not updated unless reduce
Actions are not explicitly represented
They only used word2vec embeddings [Mikolov et al 2013]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123
Our solution
We propose
Context embeddings should improve parsing accuracy
Dependency relations should be explicitly represented
Morphological Features of a word may enhance parsing accuracy
Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123
Tree-stack LSTM Overview
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
We propose Tree-stack LSTM model with 4 components
β-LSTMσ-LSTMAction-LSTMTree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123
Tree-stack LSTM
Input Representation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Omer Kırnap (Koc University) MSc Thesis September 27 2018 10 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 11 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 12 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 13 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 14 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 15 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 16 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 17 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 18 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 19 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123
Problem Definition
Find a model learning to decide correct transition from current state
Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123
2 Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123
Related Work
Neural Networks for Feature Conjunctions
Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123
Related Work
Solution Using dense embeddings for input features
Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123
3 Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and
Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using ContextEmbeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123
a Language Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123
Language Model (LM)
LM is used to obtain Context and Word embeddings with twocomponents
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123
Language Model - Word vectors
Character based LSTM generates word Vectors
Figure Character LSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123
Language Model - Context Vectors
Word based BiLSTM generates Context Vectors
Figure Word BiLSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123
b MLP Parser (CoNLL17)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123
MLP Parser
MLP Parser consists of 4 components
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Feature extractor describes current state
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123
MLP Parser - Feature Extraction
Feature extractor describes current state
Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123
MLP Parser - Decision Module
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123
Experiments amp Dataset (MLP) CoNLL17
CoNLL17 Dataset
Dependency parsing of 81 treebanks in 49 languages
All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123
Experiments - Evaluation Metric
Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label
Economic news hadGold Tree LAS 1
SBJATT
Economic news had
PREDOBJPred 1 LAS 0
Economic news had
OBJATTPred 2 LAS (frac12)100
Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123
Experiments (MLP)
CoNLL 2017 Results (all treebanks LAS)
Ranked 1st among transition based parsers 5
5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123
Contributions in CoNLL17
Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Context vectors provide independent contribution on top ofPOS tags
Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677p-fb 747 797 663
p-v-c 793 832 742
Our BiLSTM language model word vectors perform betterthan FB vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Both POS tags and context vectors have significantcontributions on top of word vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123
Issues with MLP
However
Choosing correct state of parser still remains critical
We are unable to represent whole parsing history with featureextracting
Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123
Solution
Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack
Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123
c Tree-stack LSTM Parser (CoNLL18)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123
Related Work - Stack LSTM
Figure Stack LSTM [Dyer et al 2015]
Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123
Problems with Stack LSTM
They only modify stackrsquos word embeddings
Hidden states of LSTMS are not updated unless reduce
Actions are not explicitly represented
They only used word2vec embeddings [Mikolov et al 2013]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123
Our solution
We propose
Context embeddings should improve parsing accuracy
Dependency relations should be explicitly represented
Morphological Features of a word may enhance parsing accuracy
Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123
Tree-stack LSTM Overview
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
We propose Tree-stack LSTM model with 4 components
β-LSTMσ-LSTMAction-LSTMTree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123
Tree-stack LSTM
Input Representation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Omer Kırnap (Koc University) MSc Thesis September 27 2018 11 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 12 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 13 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 14 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 15 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 16 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 17 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 18 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 19 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123
Problem Definition
Find a model learning to decide correct transition from current state
Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123
2 Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123
Related Work
Neural Networks for Feature Conjunctions
Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123
Related Work
Solution Using dense embeddings for input features
Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123
3 Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and
Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using ContextEmbeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123
a Language Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123
Language Model (LM)
LM is used to obtain Context and Word embeddings with twocomponents
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123
Language Model - Word vectors
Character based LSTM generates word Vectors
Figure Character LSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123
Language Model - Context Vectors
Word based BiLSTM generates Context Vectors
Figure Word BiLSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123
b MLP Parser (CoNLL17)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123
MLP Parser
MLP Parser consists of 4 components
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Feature extractor describes current state
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123
MLP Parser - Feature Extraction
Feature extractor describes current state
Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123
MLP Parser - Decision Module
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123
Experiments amp Dataset (MLP) CoNLL17
CoNLL17 Dataset
Dependency parsing of 81 treebanks in 49 languages
All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123
Experiments - Evaluation Metric
Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label
Economic news hadGold Tree LAS 1
SBJATT
Economic news had
PREDOBJPred 1 LAS 0
Economic news had
OBJATTPred 2 LAS (frac12)100
Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123
Experiments (MLP)
CoNLL 2017 Results (all treebanks LAS)
Ranked 1st among transition based parsers 5
5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123
Contributions in CoNLL17
Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Context vectors provide independent contribution on top ofPOS tags
Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677p-fb 747 797 663
p-v-c 793 832 742
Our BiLSTM language model word vectors perform betterthan FB vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Both POS tags and context vectors have significantcontributions on top of word vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123
Issues with MLP
However
Choosing correct state of parser still remains critical
We are unable to represent whole parsing history with featureextracting
Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123
Solution
Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack
Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123
c Tree-stack LSTM Parser (CoNLL18)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123
Related Work - Stack LSTM
Figure Stack LSTM [Dyer et al 2015]
Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123
Problems with Stack LSTM
They only modify stackrsquos word embeddings
Hidden states of LSTMS are not updated unless reduce
Actions are not explicitly represented
They only used word2vec embeddings [Mikolov et al 2013]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123
Our solution
We propose
Context embeddings should improve parsing accuracy
Dependency relations should be explicitly represented
Morphological Features of a word may enhance parsing accuracy
Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123
Tree-stack LSTM Overview
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
We propose Tree-stack LSTM model with 4 components
β-LSTMσ-LSTMAction-LSTMTree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123
Tree-stack LSTM
Input Representation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Omer Kırnap (Koc University) MSc Thesis September 27 2018 12 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 13 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 14 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 15 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 16 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 17 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 18 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 19 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123
Problem Definition
Find a model learning to decide correct transition from current state
Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123
2 Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123
Related Work
Neural Networks for Feature Conjunctions
Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123
Related Work
Solution Using dense embeddings for input features
Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123
3 Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and
Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using ContextEmbeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123
a Language Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123
Language Model (LM)
LM is used to obtain Context and Word embeddings with twocomponents
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123
Language Model - Word vectors
Character based LSTM generates word Vectors
Figure Character LSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123
Language Model - Context Vectors
Word based BiLSTM generates Context Vectors
Figure Word BiLSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123
b MLP Parser (CoNLL17)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123
MLP Parser
MLP Parser consists of 4 components
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Feature extractor describes current state
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123
MLP Parser - Feature Extraction
Feature extractor describes current state
Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123
MLP Parser - Decision Module
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123
Experiments amp Dataset (MLP) CoNLL17
CoNLL17 Dataset
Dependency parsing of 81 treebanks in 49 languages
All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123
Experiments - Evaluation Metric
Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label
Economic news hadGold Tree LAS 1
SBJATT
Economic news had
PREDOBJPred 1 LAS 0
Economic news had
OBJATTPred 2 LAS (frac12)100
Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123
Experiments (MLP)
CoNLL 2017 Results (all treebanks LAS)
Ranked 1st among transition based parsers 5
5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123
Contributions in CoNLL17
Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Context vectors provide independent contribution on top ofPOS tags
Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677p-fb 747 797 663
p-v-c 793 832 742
Our BiLSTM language model word vectors perform betterthan FB vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Both POS tags and context vectors have significantcontributions on top of word vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123
Issues with MLP
However
Choosing correct state of parser still remains critical
We are unable to represent whole parsing history with featureextracting
Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123
Solution
Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack
Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123
c Tree-stack LSTM Parser (CoNLL18)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123
Related Work - Stack LSTM
Figure Stack LSTM [Dyer et al 2015]
Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123
Problems with Stack LSTM
They only modify stackrsquos word embeddings
Hidden states of LSTMS are not updated unless reduce
Actions are not explicitly represented
They only used word2vec embeddings [Mikolov et al 2013]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123
Our solution
We propose
Context embeddings should improve parsing accuracy
Dependency relations should be explicitly represented
Morphological Features of a word may enhance parsing accuracy
Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123
Tree-stack LSTM Overview
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
We propose Tree-stack LSTM model with 4 components
β-LSTMσ-LSTMAction-LSTMTree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123
Tree-stack LSTM
Input Representation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Omer Kırnap (Koc University) MSc Thesis September 27 2018 13 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 14 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 15 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 16 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 17 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 18 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 19 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123
Problem Definition
Find a model learning to decide correct transition from current state
Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123
2 Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123
Related Work
Neural Networks for Feature Conjunctions
Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123
Related Work
Solution Using dense embeddings for input features
Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123
3 Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and
Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using ContextEmbeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123
a Language Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123
Language Model (LM)
LM is used to obtain Context and Word embeddings with twocomponents
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123
Language Model - Word vectors
Character based LSTM generates word Vectors
Figure Character LSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123
Language Model - Context Vectors
Word based BiLSTM generates Context Vectors
Figure Word BiLSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123
b MLP Parser (CoNLL17)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123
MLP Parser
MLP Parser consists of 4 components
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Feature extractor describes current state
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123
MLP Parser - Feature Extraction
Feature extractor describes current state
Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123
MLP Parser - Decision Module
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123
Experiments amp Dataset (MLP) CoNLL17
CoNLL17 Dataset
Dependency parsing of 81 treebanks in 49 languages
All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123
Experiments - Evaluation Metric
Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label
Economic news hadGold Tree LAS 1
SBJATT
Economic news had
PREDOBJPred 1 LAS 0
Economic news had
OBJATTPred 2 LAS (frac12)100
Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123
Experiments (MLP)
CoNLL 2017 Results (all treebanks LAS)
Ranked 1st among transition based parsers 5
5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123
Contributions in CoNLL17
Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Context vectors provide independent contribution on top ofPOS tags
Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677p-fb 747 797 663
p-v-c 793 832 742
Our BiLSTM language model word vectors perform betterthan FB vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Both POS tags and context vectors have significantcontributions on top of word vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123
Issues with MLP
However
Choosing correct state of parser still remains critical
We are unable to represent whole parsing history with featureextracting
Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123
Solution
Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack
Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123
c Tree-stack LSTM Parser (CoNLL18)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123
Related Work - Stack LSTM
Figure Stack LSTM [Dyer et al 2015]
Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123
Problems with Stack LSTM
They only modify stackrsquos word embeddings
Hidden states of LSTMS are not updated unless reduce
Actions are not explicitly represented
They only used word2vec embeddings [Mikolov et al 2013]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123
Our solution
We propose
Context embeddings should improve parsing accuracy
Dependency relations should be explicitly represented
Morphological Features of a word may enhance parsing accuracy
Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123
Tree-stack LSTM Overview
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
We propose Tree-stack LSTM model with 4 components
β-LSTMσ-LSTMAction-LSTMTree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123
Tree-stack LSTM
Input Representation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Omer Kırnap (Koc University) MSc Thesis September 27 2018 14 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 15 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 16 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 17 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 18 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 19 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123
Problem Definition
Find a model learning to decide correct transition from current state
Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123
2 Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123
Related Work
Neural Networks for Feature Conjunctions
Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123
Related Work
Solution Using dense embeddings for input features
Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123
3 Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and
Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using ContextEmbeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123
a Language Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123
Language Model (LM)
LM is used to obtain Context and Word embeddings with twocomponents
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123
Language Model - Word vectors
Character based LSTM generates word Vectors
Figure Character LSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123
Language Model - Context Vectors
Word based BiLSTM generates Context Vectors
Figure Word BiLSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123
b MLP Parser (CoNLL17)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123
MLP Parser
MLP Parser consists of 4 components
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Feature extractor describes current state
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123
MLP Parser - Feature Extraction
Feature extractor describes current state
Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123
MLP Parser - Decision Module
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123
Experiments amp Dataset (MLP) CoNLL17
CoNLL17 Dataset
Dependency parsing of 81 treebanks in 49 languages
All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123
Experiments - Evaluation Metric
Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label
Economic news hadGold Tree LAS 1
SBJATT
Economic news had
PREDOBJPred 1 LAS 0
Economic news had
OBJATTPred 2 LAS (frac12)100
Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123
Experiments (MLP)
CoNLL 2017 Results (all treebanks LAS)
Ranked 1st among transition based parsers 5
5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123
Contributions in CoNLL17
Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Context vectors provide independent contribution on top ofPOS tags
Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677p-fb 747 797 663
p-v-c 793 832 742
Our BiLSTM language model word vectors perform betterthan FB vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Both POS tags and context vectors have significantcontributions on top of word vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123
Issues with MLP
However
Choosing correct state of parser still remains critical
We are unable to represent whole parsing history with featureextracting
Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123
Solution
Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack
Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123
c Tree-stack LSTM Parser (CoNLL18)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123
Related Work - Stack LSTM
Figure Stack LSTM [Dyer et al 2015]
Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123
Problems with Stack LSTM
They only modify stackrsquos word embeddings
Hidden states of LSTMS are not updated unless reduce
Actions are not explicitly represented
They only used word2vec embeddings [Mikolov et al 2013]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123
Our solution
We propose
Context embeddings should improve parsing accuracy
Dependency relations should be explicitly represented
Morphological Features of a word may enhance parsing accuracy
Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123
Tree-stack LSTM Overview
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
We propose Tree-stack LSTM model with 4 components
β-LSTMσ-LSTMAction-LSTMTree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123
Tree-stack LSTM
Input Representation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Omer Kırnap (Koc University) MSc Thesis September 27 2018 15 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 16 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 17 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 18 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 19 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123
Problem Definition
Find a model learning to decide correct transition from current state
Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123
2 Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123
Related Work
Neural Networks for Feature Conjunctions
Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123
Related Work
Solution Using dense embeddings for input features
Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123
3 Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and
Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using ContextEmbeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123
a Language Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123
Language Model (LM)
LM is used to obtain Context and Word embeddings with twocomponents
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123
Language Model - Word vectors
Character based LSTM generates word Vectors
Figure Character LSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123
Language Model - Context Vectors
Word based BiLSTM generates Context Vectors
Figure Word BiLSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123
b MLP Parser (CoNLL17)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123
MLP Parser
MLP Parser consists of 4 components
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Feature extractor describes current state
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123
MLP Parser - Feature Extraction
Feature extractor describes current state
Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123
MLP Parser - Decision Module
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123
Experiments amp Dataset (MLP) CoNLL17
CoNLL17 Dataset
Dependency parsing of 81 treebanks in 49 languages
All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123
Experiments - Evaluation Metric
Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label
Economic news hadGold Tree LAS 1
SBJATT
Economic news had
PREDOBJPred 1 LAS 0
Economic news had
OBJATTPred 2 LAS (frac12)100
Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123
Experiments (MLP)
CoNLL 2017 Results (all treebanks LAS)
Ranked 1st among transition based parsers 5
5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123
Contributions in CoNLL17
Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Context vectors provide independent contribution on top ofPOS tags
Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677p-fb 747 797 663
p-v-c 793 832 742
Our BiLSTM language model word vectors perform betterthan FB vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Both POS tags and context vectors have significantcontributions on top of word vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123
Issues with MLP
However
Choosing correct state of parser still remains critical
We are unable to represent whole parsing history with featureextracting
Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123
Solution
Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack
Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123
c Tree-stack LSTM Parser (CoNLL18)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123
Related Work - Stack LSTM
Figure Stack LSTM [Dyer et al 2015]
Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123
Problems with Stack LSTM
They only modify stackrsquos word embeddings
Hidden states of LSTMS are not updated unless reduce
Actions are not explicitly represented
They only used word2vec embeddings [Mikolov et al 2013]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123
Our solution
We propose
Context embeddings should improve parsing accuracy
Dependency relations should be explicitly represented
Morphological Features of a word may enhance parsing accuracy
Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123
Tree-stack LSTM Overview
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
We propose Tree-stack LSTM model with 4 components
β-LSTMσ-LSTMAction-LSTMTree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123
Tree-stack LSTM
Input Representation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Omer Kırnap (Koc University) MSc Thesis September 27 2018 16 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 17 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 18 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 19 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123
Problem Definition
Find a model learning to decide correct transition from current state
Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123
2 Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123
Related Work
Neural Networks for Feature Conjunctions
Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123
Related Work
Solution Using dense embeddings for input features
Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123
3 Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and
Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using ContextEmbeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123
a Language Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123
Language Model (LM)
LM is used to obtain Context and Word embeddings with twocomponents
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123
Language Model - Word vectors
Character based LSTM generates word Vectors
Figure Character LSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123
Language Model - Context Vectors
Word based BiLSTM generates Context Vectors
Figure Word BiLSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123
b MLP Parser (CoNLL17)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123
MLP Parser
MLP Parser consists of 4 components
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Feature extractor describes current state
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123
MLP Parser - Feature Extraction
Feature extractor describes current state
Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123
MLP Parser - Decision Module
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123
Experiments amp Dataset (MLP) CoNLL17
CoNLL17 Dataset
Dependency parsing of 81 treebanks in 49 languages
All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123
Experiments - Evaluation Metric
Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label
Economic news hadGold Tree LAS 1
SBJATT
Economic news had
PREDOBJPred 1 LAS 0
Economic news had
OBJATTPred 2 LAS (frac12)100
Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123
Experiments (MLP)
CoNLL 2017 Results (all treebanks LAS)
Ranked 1st among transition based parsers 5
5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123
Contributions in CoNLL17
Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Context vectors provide independent contribution on top ofPOS tags
Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677p-fb 747 797 663
p-v-c 793 832 742
Our BiLSTM language model word vectors perform betterthan FB vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Both POS tags and context vectors have significantcontributions on top of word vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123
Issues with MLP
However
Choosing correct state of parser still remains critical
We are unable to represent whole parsing history with featureextracting
Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123
Solution
Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack
Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123
c Tree-stack LSTM Parser (CoNLL18)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123
Related Work - Stack LSTM
Figure Stack LSTM [Dyer et al 2015]
Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123
Problems with Stack LSTM
They only modify stackrsquos word embeddings
Hidden states of LSTMS are not updated unless reduce
Actions are not explicitly represented
They only used word2vec embeddings [Mikolov et al 2013]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123
Our solution
We propose
Context embeddings should improve parsing accuracy
Dependency relations should be explicitly represented
Morphological Features of a word may enhance parsing accuracy
Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123
Tree-stack LSTM Overview
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
We propose Tree-stack LSTM model with 4 components
β-LSTMσ-LSTMAction-LSTMTree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123
Tree-stack LSTM
Input Representation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Omer Kırnap (Koc University) MSc Thesis September 27 2018 17 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 18 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 19 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123
Problem Definition
Find a model learning to decide correct transition from current state
Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123
2 Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123
Related Work
Neural Networks for Feature Conjunctions
Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123
Related Work
Solution Using dense embeddings for input features
Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123
3 Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and
Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using ContextEmbeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123
a Language Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123
Language Model (LM)
LM is used to obtain Context and Word embeddings with twocomponents
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123
Language Model - Word vectors
Character based LSTM generates word Vectors
Figure Character LSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123
Language Model - Context Vectors
Word based BiLSTM generates Context Vectors
Figure Word BiLSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123
b MLP Parser (CoNLL17)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123
MLP Parser
MLP Parser consists of 4 components
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Feature extractor describes current state
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123
MLP Parser - Feature Extraction
Feature extractor describes current state
Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123
MLP Parser - Decision Module
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123
Experiments amp Dataset (MLP) CoNLL17
CoNLL17 Dataset
Dependency parsing of 81 treebanks in 49 languages
All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123
Experiments - Evaluation Metric
Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label
Economic news hadGold Tree LAS 1
SBJATT
Economic news had
PREDOBJPred 1 LAS 0
Economic news had
OBJATTPred 2 LAS (frac12)100
Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123
Experiments (MLP)
CoNLL 2017 Results (all treebanks LAS)
Ranked 1st among transition based parsers 5
5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123
Contributions in CoNLL17
Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Context vectors provide independent contribution on top ofPOS tags
Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677p-fb 747 797 663
p-v-c 793 832 742
Our BiLSTM language model word vectors perform betterthan FB vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Both POS tags and context vectors have significantcontributions on top of word vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123
Issues with MLP
However
Choosing correct state of parser still remains critical
We are unable to represent whole parsing history with featureextracting
Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123
Solution
Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack
Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123
c Tree-stack LSTM Parser (CoNLL18)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123
Related Work - Stack LSTM
Figure Stack LSTM [Dyer et al 2015]
Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123
Problems with Stack LSTM
They only modify stackrsquos word embeddings
Hidden states of LSTMS are not updated unless reduce
Actions are not explicitly represented
They only used word2vec embeddings [Mikolov et al 2013]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123
Our solution
We propose
Context embeddings should improve parsing accuracy
Dependency relations should be explicitly represented
Morphological Features of a word may enhance parsing accuracy
Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123
Tree-stack LSTM Overview
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
We propose Tree-stack LSTM model with 4 components
β-LSTMσ-LSTMAction-LSTMTree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123
Tree-stack LSTM
Input Representation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Omer Kırnap (Koc University) MSc Thesis September 27 2018 18 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 19 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123
Problem Definition
Find a model learning to decide correct transition from current state
Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123
2 Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123
Related Work
Neural Networks for Feature Conjunctions
Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123
Related Work
Solution Using dense embeddings for input features
Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123
3 Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and
Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using ContextEmbeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123
a Language Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123
Language Model (LM)
LM is used to obtain Context and Word embeddings with twocomponents
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123
Language Model - Word vectors
Character based LSTM generates word Vectors
Figure Character LSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123
Language Model - Context Vectors
Word based BiLSTM generates Context Vectors
Figure Word BiLSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123
b MLP Parser (CoNLL17)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123
MLP Parser
MLP Parser consists of 4 components
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Feature extractor describes current state
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123
MLP Parser - Feature Extraction
Feature extractor describes current state
Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123
MLP Parser - Decision Module
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123
Experiments amp Dataset (MLP) CoNLL17
CoNLL17 Dataset
Dependency parsing of 81 treebanks in 49 languages
All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123
Experiments - Evaluation Metric
Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label
Economic news hadGold Tree LAS 1
SBJATT
Economic news had
PREDOBJPred 1 LAS 0
Economic news had
OBJATTPred 2 LAS (frac12)100
Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123
Experiments (MLP)
CoNLL 2017 Results (all treebanks LAS)
Ranked 1st among transition based parsers 5
5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123
Contributions in CoNLL17
Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Context vectors provide independent contribution on top ofPOS tags
Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677p-fb 747 797 663
p-v-c 793 832 742
Our BiLSTM language model word vectors perform betterthan FB vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Both POS tags and context vectors have significantcontributions on top of word vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123
Issues with MLP
However
Choosing correct state of parser still remains critical
We are unable to represent whole parsing history with featureextracting
Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123
Solution
Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack
Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123
c Tree-stack LSTM Parser (CoNLL18)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123
Related Work - Stack LSTM
Figure Stack LSTM [Dyer et al 2015]
Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123
Problems with Stack LSTM
They only modify stackrsquos word embeddings
Hidden states of LSTMS are not updated unless reduce
Actions are not explicitly represented
They only used word2vec embeddings [Mikolov et al 2013]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123
Our solution
We propose
Context embeddings should improve parsing accuracy
Dependency relations should be explicitly represented
Morphological Features of a word may enhance parsing accuracy
Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123
Tree-stack LSTM Overview
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
We propose Tree-stack LSTM model with 4 components
β-LSTMσ-LSTMAction-LSTMTree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123
Tree-stack LSTM
Input Representation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Omer Kırnap (Koc University) MSc Thesis September 27 2018 19 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123
Problem Definition
Find a model learning to decide correct transition from current state
Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123
2 Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123
Related Work
Neural Networks for Feature Conjunctions
Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123
Related Work
Solution Using dense embeddings for input features
Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123
3 Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and
Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using ContextEmbeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123
a Language Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123
Language Model (LM)
LM is used to obtain Context and Word embeddings with twocomponents
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123
Language Model - Word vectors
Character based LSTM generates word Vectors
Figure Character LSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123
Language Model - Context Vectors
Word based BiLSTM generates Context Vectors
Figure Word BiLSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123
b MLP Parser (CoNLL17)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123
MLP Parser
MLP Parser consists of 4 components
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Feature extractor describes current state
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123
MLP Parser - Feature Extraction
Feature extractor describes current state
Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123
MLP Parser - Decision Module
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123
Experiments amp Dataset (MLP) CoNLL17
CoNLL17 Dataset
Dependency parsing of 81 treebanks in 49 languages
All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123
Experiments - Evaluation Metric
Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label
Economic news hadGold Tree LAS 1
SBJATT
Economic news had
PREDOBJPred 1 LAS 0
Economic news had
OBJATTPred 2 LAS (frac12)100
Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123
Experiments (MLP)
CoNLL 2017 Results (all treebanks LAS)
Ranked 1st among transition based parsers 5
5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123
Contributions in CoNLL17
Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Context vectors provide independent contribution on top ofPOS tags
Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677p-fb 747 797 663
p-v-c 793 832 742
Our BiLSTM language model word vectors perform betterthan FB vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Both POS tags and context vectors have significantcontributions on top of word vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123
Issues with MLP
However
Choosing correct state of parser still remains critical
We are unable to represent whole parsing history with featureextracting
Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123
Solution
Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack
Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123
c Tree-stack LSTM Parser (CoNLL18)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123
Related Work - Stack LSTM
Figure Stack LSTM [Dyer et al 2015]
Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123
Problems with Stack LSTM
They only modify stackrsquos word embeddings
Hidden states of LSTMS are not updated unless reduce
Actions are not explicitly represented
They only used word2vec embeddings [Mikolov et al 2013]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123
Our solution
We propose
Context embeddings should improve parsing accuracy
Dependency relations should be explicitly represented
Morphological Features of a word may enhance parsing accuracy
Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123
Tree-stack LSTM Overview
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
We propose Tree-stack LSTM model with 4 components
β-LSTMσ-LSTMAction-LSTMTree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123
Tree-stack LSTM
Input Representation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Omer Kırnap (Koc University) MSc Thesis September 27 2018 20 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123
Problem Definition
Find a model learning to decide correct transition from current state
Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123
2 Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123
Related Work
Neural Networks for Feature Conjunctions
Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123
Related Work
Solution Using dense embeddings for input features
Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123
3 Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and
Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using ContextEmbeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123
a Language Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123
Language Model (LM)
LM is used to obtain Context and Word embeddings with twocomponents
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123
Language Model - Word vectors
Character based LSTM generates word Vectors
Figure Character LSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123
Language Model - Context Vectors
Word based BiLSTM generates Context Vectors
Figure Word BiLSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123
b MLP Parser (CoNLL17)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123
MLP Parser
MLP Parser consists of 4 components
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Feature extractor describes current state
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123
MLP Parser - Feature Extraction
Feature extractor describes current state
Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123
MLP Parser - Decision Module
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123
Experiments amp Dataset (MLP) CoNLL17
CoNLL17 Dataset
Dependency parsing of 81 treebanks in 49 languages
All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123
Experiments - Evaluation Metric
Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label
Economic news hadGold Tree LAS 1
SBJATT
Economic news had
PREDOBJPred 1 LAS 0
Economic news had
OBJATTPred 2 LAS (frac12)100
Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123
Experiments (MLP)
CoNLL 2017 Results (all treebanks LAS)
Ranked 1st among transition based parsers 5
5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123
Contributions in CoNLL17
Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Context vectors provide independent contribution on top ofPOS tags
Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677p-fb 747 797 663
p-v-c 793 832 742
Our BiLSTM language model word vectors perform betterthan FB vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Both POS tags and context vectors have significantcontributions on top of word vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123
Issues with MLP
However
Choosing correct state of parser still remains critical
We are unable to represent whole parsing history with featureextracting
Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123
Solution
Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack
Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123
c Tree-stack LSTM Parser (CoNLL18)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123
Related Work - Stack LSTM
Figure Stack LSTM [Dyer et al 2015]
Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123
Problems with Stack LSTM
They only modify stackrsquos word embeddings
Hidden states of LSTMS are not updated unless reduce
Actions are not explicitly represented
They only used word2vec embeddings [Mikolov et al 2013]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123
Our solution
We propose
Context embeddings should improve parsing accuracy
Dependency relations should be explicitly represented
Morphological Features of a word may enhance parsing accuracy
Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123
Tree-stack LSTM Overview
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
We propose Tree-stack LSTM model with 4 components
β-LSTMσ-LSTMAction-LSTMTree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123
Tree-stack LSTM
Input Representation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Omer Kırnap (Koc University) MSc Thesis September 27 2018 21 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123
Problem Definition
Find a model learning to decide correct transition from current state
Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123
2 Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123
Related Work
Neural Networks for Feature Conjunctions
Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123
Related Work
Solution Using dense embeddings for input features
Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123
3 Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and
Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using ContextEmbeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123
a Language Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123
Language Model (LM)
LM is used to obtain Context and Word embeddings with twocomponents
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123
Language Model - Word vectors
Character based LSTM generates word Vectors
Figure Character LSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123
Language Model - Context Vectors
Word based BiLSTM generates Context Vectors
Figure Word BiLSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123
b MLP Parser (CoNLL17)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123
MLP Parser
MLP Parser consists of 4 components
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Feature extractor describes current state
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123
MLP Parser - Feature Extraction
Feature extractor describes current state
Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123
MLP Parser - Decision Module
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123
Experiments amp Dataset (MLP) CoNLL17
CoNLL17 Dataset
Dependency parsing of 81 treebanks in 49 languages
All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123
Experiments - Evaluation Metric
Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label
Economic news hadGold Tree LAS 1
SBJATT
Economic news had
PREDOBJPred 1 LAS 0
Economic news had
OBJATTPred 2 LAS (frac12)100
Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123
Experiments (MLP)
CoNLL 2017 Results (all treebanks LAS)
Ranked 1st among transition based parsers 5
5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123
Contributions in CoNLL17
Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Context vectors provide independent contribution on top ofPOS tags
Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677p-fb 747 797 663
p-v-c 793 832 742
Our BiLSTM language model word vectors perform betterthan FB vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Both POS tags and context vectors have significantcontributions on top of word vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123
Issues with MLP
However
Choosing correct state of parser still remains critical
We are unable to represent whole parsing history with featureextracting
Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123
Solution
Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack
Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123
c Tree-stack LSTM Parser (CoNLL18)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123
Related Work - Stack LSTM
Figure Stack LSTM [Dyer et al 2015]
Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123
Problems with Stack LSTM
They only modify stackrsquos word embeddings
Hidden states of LSTMS are not updated unless reduce
Actions are not explicitly represented
They only used word2vec embeddings [Mikolov et al 2013]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123
Our solution
We propose
Context embeddings should improve parsing accuracy
Dependency relations should be explicitly represented
Morphological Features of a word may enhance parsing accuracy
Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123
Tree-stack LSTM Overview
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
We propose Tree-stack LSTM model with 4 components
β-LSTMσ-LSTMAction-LSTMTree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123
Tree-stack LSTM
Input Representation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Omer Kırnap (Koc University) MSc Thesis September 27 2018 22 123
Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123
Problem Definition
Find a model learning to decide correct transition from current state
Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123
2 Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123
Related Work
Neural Networks for Feature Conjunctions
Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123
Related Work
Solution Using dense embeddings for input features
Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123
3 Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and
Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using ContextEmbeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123
a Language Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123
Language Model (LM)
LM is used to obtain Context and Word embeddings with twocomponents
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123
Language Model - Word vectors
Character based LSTM generates word Vectors
Figure Character LSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123
Language Model - Context Vectors
Word based BiLSTM generates Context Vectors
Figure Word BiLSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123
b MLP Parser (CoNLL17)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123
MLP Parser
MLP Parser consists of 4 components
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Feature extractor describes current state
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123
MLP Parser - Feature Extraction
Feature extractor describes current state
Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123
MLP Parser - Decision Module
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123
Experiments amp Dataset (MLP) CoNLL17
CoNLL17 Dataset
Dependency parsing of 81 treebanks in 49 languages
All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123
Experiments - Evaluation Metric
Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label
Economic news hadGold Tree LAS 1
SBJATT
Economic news had
PREDOBJPred 1 LAS 0
Economic news had
OBJATTPred 2 LAS (frac12)100
Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123
Experiments (MLP)
CoNLL 2017 Results (all treebanks LAS)
Ranked 1st among transition based parsers 5
5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123
Contributions in CoNLL17
Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Context vectors provide independent contribution on top ofPOS tags
Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677p-fb 747 797 663
p-v-c 793 832 742
Our BiLSTM language model word vectors perform betterthan FB vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Both POS tags and context vectors have significantcontributions on top of word vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123
Issues with MLP
However
Choosing correct state of parser still remains critical
We are unable to represent whole parsing history with featureextracting
Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123
Solution
Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack
Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123
c Tree-stack LSTM Parser (CoNLL18)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123
Related Work - Stack LSTM
Figure Stack LSTM [Dyer et al 2015]
Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123
Problems with Stack LSTM
They only modify stackrsquos word embeddings
Hidden states of LSTMS are not updated unless reduce
Actions are not explicitly represented
They only used word2vec embeddings [Mikolov et al 2013]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123
Our solution
We propose
Context embeddings should improve parsing accuracy
Dependency relations should be explicitly represented
Morphological Features of a word may enhance parsing accuracy
Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123
Tree-stack LSTM Overview
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
We propose Tree-stack LSTM model with 4 components
β-LSTMσ-LSTMAction-LSTMTree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123
Tree-stack LSTM
Input Representation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Omer Kırnap (Koc University) MSc Thesis September 27 2018 23 123
Problem Definition
Find a model learning to decide correct transition from current state
Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123
2 Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123
Related Work
Neural Networks for Feature Conjunctions
Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123
Related Work
Solution Using dense embeddings for input features
Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123
3 Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and
Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using ContextEmbeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123
a Language Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123
Language Model (LM)
LM is used to obtain Context and Word embeddings with twocomponents
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123
Language Model - Word vectors
Character based LSTM generates word Vectors
Figure Character LSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123
Language Model - Context Vectors
Word based BiLSTM generates Context Vectors
Figure Word BiLSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123
b MLP Parser (CoNLL17)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123
MLP Parser
MLP Parser consists of 4 components
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Feature extractor describes current state
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123
MLP Parser - Feature Extraction
Feature extractor describes current state
Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123
MLP Parser - Decision Module
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123
Experiments amp Dataset (MLP) CoNLL17
CoNLL17 Dataset
Dependency parsing of 81 treebanks in 49 languages
All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123
Experiments - Evaluation Metric
Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label
Economic news hadGold Tree LAS 1
SBJATT
Economic news had
PREDOBJPred 1 LAS 0
Economic news had
OBJATTPred 2 LAS (frac12)100
Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123
Experiments (MLP)
CoNLL 2017 Results (all treebanks LAS)
Ranked 1st among transition based parsers 5
5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123
Contributions in CoNLL17
Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Context vectors provide independent contribution on top ofPOS tags
Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677p-fb 747 797 663
p-v-c 793 832 742
Our BiLSTM language model word vectors perform betterthan FB vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Both POS tags and context vectors have significantcontributions on top of word vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123
Issues with MLP
However
Choosing correct state of parser still remains critical
We are unable to represent whole parsing history with featureextracting
Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123
Solution
Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack
Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123
c Tree-stack LSTM Parser (CoNLL18)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123
Related Work - Stack LSTM
Figure Stack LSTM [Dyer et al 2015]
Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123
Problems with Stack LSTM
They only modify stackrsquos word embeddings
Hidden states of LSTMS are not updated unless reduce
Actions are not explicitly represented
They only used word2vec embeddings [Mikolov et al 2013]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123
Our solution
We propose
Context embeddings should improve parsing accuracy
Dependency relations should be explicitly represented
Morphological Features of a word may enhance parsing accuracy
Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123
Tree-stack LSTM Overview
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
We propose Tree-stack LSTM model with 4 components
β-LSTMσ-LSTMAction-LSTMTree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123
Tree-stack LSTM
Input Representation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Problem Definition
Find a model learning to decide correct transition from current state
Omer Kırnap (Koc University) MSc Thesis September 27 2018 24 123
2 Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123
Related Work
Neural Networks for Feature Conjunctions
Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123
Related Work
Solution Using dense embeddings for input features
Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123
3 Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and
Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using ContextEmbeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123
a Language Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123
Language Model (LM)
LM is used to obtain Context and Word embeddings with twocomponents
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123
Language Model - Word vectors
Character based LSTM generates word Vectors
Figure Character LSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123
Language Model - Context Vectors
Word based BiLSTM generates Context Vectors
Figure Word BiLSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123
b MLP Parser (CoNLL17)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123
MLP Parser
MLP Parser consists of 4 components
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Feature extractor describes current state
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123
MLP Parser - Feature Extraction
Feature extractor describes current state
Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123
MLP Parser - Decision Module
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123
Experiments amp Dataset (MLP) CoNLL17
CoNLL17 Dataset
Dependency parsing of 81 treebanks in 49 languages
All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123
Experiments - Evaluation Metric
Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label
Economic news hadGold Tree LAS 1
SBJATT
Economic news had
PREDOBJPred 1 LAS 0
Economic news had
OBJATTPred 2 LAS (frac12)100
Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123
Experiments (MLP)
CoNLL 2017 Results (all treebanks LAS)
Ranked 1st among transition based parsers 5
5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123
Contributions in CoNLL17
Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Context vectors provide independent contribution on top ofPOS tags
Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677p-fb 747 797 663
p-v-c 793 832 742
Our BiLSTM language model word vectors perform betterthan FB vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Both POS tags and context vectors have significantcontributions on top of word vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123
Issues with MLP
However
Choosing correct state of parser still remains critical
We are unable to represent whole parsing history with featureextracting
Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123
Solution
Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack
Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123
c Tree-stack LSTM Parser (CoNLL18)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123
Related Work - Stack LSTM
Figure Stack LSTM [Dyer et al 2015]
Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123
Problems with Stack LSTM
They only modify stackrsquos word embeddings
Hidden states of LSTMS are not updated unless reduce
Actions are not explicitly represented
They only used word2vec embeddings [Mikolov et al 2013]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123
Our solution
We propose
Context embeddings should improve parsing accuracy
Dependency relations should be explicitly represented
Morphological Features of a word may enhance parsing accuracy
Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123
Tree-stack LSTM Overview
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
We propose Tree-stack LSTM model with 4 components
β-LSTMσ-LSTMAction-LSTMTree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123
Tree-stack LSTM
Input Representation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
2 Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 25 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123
Related Work
Neural Networks for Feature Conjunctions
Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123
Related Work
Solution Using dense embeddings for input features
Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123
3 Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and
Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using ContextEmbeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123
a Language Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123
Language Model (LM)
LM is used to obtain Context and Word embeddings with twocomponents
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123
Language Model - Word vectors
Character based LSTM generates word Vectors
Figure Character LSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123
Language Model - Context Vectors
Word based BiLSTM generates Context Vectors
Figure Word BiLSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123
b MLP Parser (CoNLL17)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123
MLP Parser
MLP Parser consists of 4 components
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Feature extractor describes current state
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123
MLP Parser - Feature Extraction
Feature extractor describes current state
Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123
MLP Parser - Decision Module
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123
Experiments amp Dataset (MLP) CoNLL17
CoNLL17 Dataset
Dependency parsing of 81 treebanks in 49 languages
All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123
Experiments - Evaluation Metric
Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label
Economic news hadGold Tree LAS 1
SBJATT
Economic news had
PREDOBJPred 1 LAS 0
Economic news had
OBJATTPred 2 LAS (frac12)100
Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123
Experiments (MLP)
CoNLL 2017 Results (all treebanks LAS)
Ranked 1st among transition based parsers 5
5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123
Contributions in CoNLL17
Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Context vectors provide independent contribution on top ofPOS tags
Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677p-fb 747 797 663
p-v-c 793 832 742
Our BiLSTM language model word vectors perform betterthan FB vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Both POS tags and context vectors have significantcontributions on top of word vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123
Issues with MLP
However
Choosing correct state of parser still remains critical
We are unable to represent whole parsing history with featureextracting
Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123
Solution
Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack
Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123
c Tree-stack LSTM Parser (CoNLL18)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123
Related Work - Stack LSTM
Figure Stack LSTM [Dyer et al 2015]
Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123
Problems with Stack LSTM
They only modify stackrsquos word embeddings
Hidden states of LSTMS are not updated unless reduce
Actions are not explicitly represented
They only used word2vec embeddings [Mikolov et al 2013]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123
Our solution
We propose
Context embeddings should improve parsing accuracy
Dependency relations should be explicitly represented
Morphological Features of a word may enhance parsing accuracy
Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123
Tree-stack LSTM Overview
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
We propose Tree-stack LSTM model with 4 components
β-LSTMσ-LSTMAction-LSTMTree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123
Tree-stack LSTM
Input Representation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 26 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123
Related Work
Neural Networks for Feature Conjunctions
Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123
Related Work
Solution Using dense embeddings for input features
Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123
3 Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and
Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using ContextEmbeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123
a Language Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123
Language Model (LM)
LM is used to obtain Context and Word embeddings with twocomponents
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123
Language Model - Word vectors
Character based LSTM generates word Vectors
Figure Character LSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123
Language Model - Context Vectors
Word based BiLSTM generates Context Vectors
Figure Word BiLSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123
b MLP Parser (CoNLL17)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123
MLP Parser
MLP Parser consists of 4 components
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Feature extractor describes current state
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123
MLP Parser - Feature Extraction
Feature extractor describes current state
Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123
MLP Parser - Decision Module
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123
Experiments amp Dataset (MLP) CoNLL17
CoNLL17 Dataset
Dependency parsing of 81 treebanks in 49 languages
All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123
Experiments - Evaluation Metric
Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label
Economic news hadGold Tree LAS 1
SBJATT
Economic news had
PREDOBJPred 1 LAS 0
Economic news had
OBJATTPred 2 LAS (frac12)100
Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123
Experiments (MLP)
CoNLL 2017 Results (all treebanks LAS)
Ranked 1st among transition based parsers 5
5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123
Contributions in CoNLL17
Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Context vectors provide independent contribution on top ofPOS tags
Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677p-fb 747 797 663
p-v-c 793 832 742
Our BiLSTM language model word vectors perform betterthan FB vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Both POS tags and context vectors have significantcontributions on top of word vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123
Issues with MLP
However
Choosing correct state of parser still remains critical
We are unable to represent whole parsing history with featureextracting
Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123
Solution
Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack
Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123
c Tree-stack LSTM Parser (CoNLL18)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123
Related Work - Stack LSTM
Figure Stack LSTM [Dyer et al 2015]
Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123
Problems with Stack LSTM
They only modify stackrsquos word embeddings
Hidden states of LSTMS are not updated unless reduce
Actions are not explicitly represented
They only used word2vec embeddings [Mikolov et al 2013]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123
Our solution
We propose
Context embeddings should improve parsing accuracy
Dependency relations should be explicitly represented
Morphological Features of a word may enhance parsing accuracy
Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123
Tree-stack LSTM Overview
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
We propose Tree-stack LSTM model with 4 components
β-LSTMσ-LSTMAction-LSTMTree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123
Tree-stack LSTM
Input Representation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 27 123
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123
Related Work
Neural Networks for Feature Conjunctions
Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123
Related Work
Solution Using dense embeddings for input features
Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123
3 Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and
Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using ContextEmbeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123
a Language Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123
Language Model (LM)
LM is used to obtain Context and Word embeddings with twocomponents
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123
Language Model - Word vectors
Character based LSTM generates word Vectors
Figure Character LSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123
Language Model - Context Vectors
Word based BiLSTM generates Context Vectors
Figure Word BiLSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123
b MLP Parser (CoNLL17)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123
MLP Parser
MLP Parser consists of 4 components
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Feature extractor describes current state
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123
MLP Parser - Feature Extraction
Feature extractor describes current state
Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123
MLP Parser - Decision Module
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123
Experiments amp Dataset (MLP) CoNLL17
CoNLL17 Dataset
Dependency parsing of 81 treebanks in 49 languages
All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123
Experiments - Evaluation Metric
Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label
Economic news hadGold Tree LAS 1
SBJATT
Economic news had
PREDOBJPred 1 LAS 0
Economic news had
OBJATTPred 2 LAS (frac12)100
Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123
Experiments (MLP)
CoNLL 2017 Results (all treebanks LAS)
Ranked 1st among transition based parsers 5
5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123
Contributions in CoNLL17
Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Context vectors provide independent contribution on top ofPOS tags
Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677p-fb 747 797 663
p-v-c 793 832 742
Our BiLSTM language model word vectors perform betterthan FB vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Both POS tags and context vectors have significantcontributions on top of word vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123
Issues with MLP
However
Choosing correct state of parser still remains critical
We are unable to represent whole parsing history with featureextracting
Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123
Solution
Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack
Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123
c Tree-stack LSTM Parser (CoNLL18)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123
Related Work - Stack LSTM
Figure Stack LSTM [Dyer et al 2015]
Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123
Problems with Stack LSTM
They only modify stackrsquos word embeddings
Hidden states of LSTMS are not updated unless reduce
Actions are not explicitly represented
They only used word2vec embeddings [Mikolov et al 2013]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123
Our solution
We propose
Context embeddings should improve parsing accuracy
Dependency relations should be explicitly represented
Morphological Features of a word may enhance parsing accuracy
Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123
Tree-stack LSTM Overview
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
We propose Tree-stack LSTM model with 4 components
β-LSTMσ-LSTMAction-LSTMTree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123
Tree-stack LSTM
Input Representation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Related Work
Omer Kırnap (Koc University) MSc Thesis September 27 2018 28 123
Related Work
Neural Networks for Feature Conjunctions
Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123
Related Work
Solution Using dense embeddings for input features
Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123
3 Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and
Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using ContextEmbeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123
a Language Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123
Language Model (LM)
LM is used to obtain Context and Word embeddings with twocomponents
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123
Language Model - Word vectors
Character based LSTM generates word Vectors
Figure Character LSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123
Language Model - Context Vectors
Word based BiLSTM generates Context Vectors
Figure Word BiLSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123
b MLP Parser (CoNLL17)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123
MLP Parser
MLP Parser consists of 4 components
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Feature extractor describes current state
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123
MLP Parser - Feature Extraction
Feature extractor describes current state
Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123
MLP Parser - Decision Module
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123
Experiments amp Dataset (MLP) CoNLL17
CoNLL17 Dataset
Dependency parsing of 81 treebanks in 49 languages
All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123
Experiments - Evaluation Metric
Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label
Economic news hadGold Tree LAS 1
SBJATT
Economic news had
PREDOBJPred 1 LAS 0
Economic news had
OBJATTPred 2 LAS (frac12)100
Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123
Experiments (MLP)
CoNLL 2017 Results (all treebanks LAS)
Ranked 1st among transition based parsers 5
5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123
Contributions in CoNLL17
Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Context vectors provide independent contribution on top ofPOS tags
Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677p-fb 747 797 663
p-v-c 793 832 742
Our BiLSTM language model word vectors perform betterthan FB vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Both POS tags and context vectors have significantcontributions on top of word vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123
Issues with MLP
However
Choosing correct state of parser still remains critical
We are unable to represent whole parsing history with featureextracting
Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123
Solution
Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack
Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123
c Tree-stack LSTM Parser (CoNLL18)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123
Related Work - Stack LSTM
Figure Stack LSTM [Dyer et al 2015]
Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123
Problems with Stack LSTM
They only modify stackrsquos word embeddings
Hidden states of LSTMS are not updated unless reduce
Actions are not explicitly represented
They only used word2vec embeddings [Mikolov et al 2013]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123
Our solution
We propose
Context embeddings should improve parsing accuracy
Dependency relations should be explicitly represented
Morphological Features of a word may enhance parsing accuracy
Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123
Tree-stack LSTM Overview
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
We propose Tree-stack LSTM model with 4 components
β-LSTMσ-LSTMAction-LSTMTree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123
Tree-stack LSTM
Input Representation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Related Work
Neural Networks for Feature Conjunctions
Neural networks can handle feature conjunctions and nonlinearityHoweverImpractical for high dimensional inputs they scale linearly in inputdimensions (in both time and space assuming fixed number of hiddenunits)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 29 123
Related Work
Solution Using dense embeddings for input features
Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123
3 Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and
Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using ContextEmbeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123
a Language Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123
Language Model (LM)
LM is used to obtain Context and Word embeddings with twocomponents
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123
Language Model - Word vectors
Character based LSTM generates word Vectors
Figure Character LSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123
Language Model - Context Vectors
Word based BiLSTM generates Context Vectors
Figure Word BiLSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123
b MLP Parser (CoNLL17)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123
MLP Parser
MLP Parser consists of 4 components
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Feature extractor describes current state
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123
MLP Parser - Feature Extraction
Feature extractor describes current state
Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123
MLP Parser - Decision Module
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123
Experiments amp Dataset (MLP) CoNLL17
CoNLL17 Dataset
Dependency parsing of 81 treebanks in 49 languages
All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123
Experiments - Evaluation Metric
Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label
Economic news hadGold Tree LAS 1
SBJATT
Economic news had
PREDOBJPred 1 LAS 0
Economic news had
OBJATTPred 2 LAS (frac12)100
Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123
Experiments (MLP)
CoNLL 2017 Results (all treebanks LAS)
Ranked 1st among transition based parsers 5
5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123
Contributions in CoNLL17
Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Context vectors provide independent contribution on top ofPOS tags
Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677p-fb 747 797 663
p-v-c 793 832 742
Our BiLSTM language model word vectors perform betterthan FB vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Both POS tags and context vectors have significantcontributions on top of word vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123
Issues with MLP
However
Choosing correct state of parser still remains critical
We are unable to represent whole parsing history with featureextracting
Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123
Solution
Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack
Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123
c Tree-stack LSTM Parser (CoNLL18)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123
Related Work - Stack LSTM
Figure Stack LSTM [Dyer et al 2015]
Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123
Problems with Stack LSTM
They only modify stackrsquos word embeddings
Hidden states of LSTMS are not updated unless reduce
Actions are not explicitly represented
They only used word2vec embeddings [Mikolov et al 2013]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123
Our solution
We propose
Context embeddings should improve parsing accuracy
Dependency relations should be explicitly represented
Morphological Features of a word may enhance parsing accuracy
Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123
Tree-stack LSTM Overview
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
We propose Tree-stack LSTM model with 4 components
β-LSTMσ-LSTMAction-LSTMTree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123
Tree-stack LSTM
Input Representation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Related Work
Solution Using dense embeddings for input features
Omer Kırnap (Koc University) MSc Thesis September 27 2018 30 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123
3 Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and
Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using ContextEmbeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123
a Language Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123
Language Model (LM)
LM is used to obtain Context and Word embeddings with twocomponents
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123
Language Model - Word vectors
Character based LSTM generates word Vectors
Figure Character LSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123
Language Model - Context Vectors
Word based BiLSTM generates Context Vectors
Figure Word BiLSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123
b MLP Parser (CoNLL17)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123
MLP Parser
MLP Parser consists of 4 components
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Feature extractor describes current state
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123
MLP Parser - Feature Extraction
Feature extractor describes current state
Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123
MLP Parser - Decision Module
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123
Experiments amp Dataset (MLP) CoNLL17
CoNLL17 Dataset
Dependency parsing of 81 treebanks in 49 languages
All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123
Experiments - Evaluation Metric
Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label
Economic news hadGold Tree LAS 1
SBJATT
Economic news had
PREDOBJPred 1 LAS 0
Economic news had
OBJATTPred 2 LAS (frac12)100
Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123
Experiments (MLP)
CoNLL 2017 Results (all treebanks LAS)
Ranked 1st among transition based parsers 5
5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123
Contributions in CoNLL17
Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Context vectors provide independent contribution on top ofPOS tags
Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677p-fb 747 797 663
p-v-c 793 832 742
Our BiLSTM language model word vectors perform betterthan FB vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Both POS tags and context vectors have significantcontributions on top of word vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123
Issues with MLP
However
Choosing correct state of parser still remains critical
We are unable to represent whole parsing history with featureextracting
Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123
Solution
Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack
Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123
c Tree-stack LSTM Parser (CoNLL18)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123
Related Work - Stack LSTM
Figure Stack LSTM [Dyer et al 2015]
Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123
Problems with Stack LSTM
They only modify stackrsquos word embeddings
Hidden states of LSTMS are not updated unless reduce
Actions are not explicitly represented
They only used word2vec embeddings [Mikolov et al 2013]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123
Our solution
We propose
Context embeddings should improve parsing accuracy
Dependency relations should be explicitly represented
Morphological Features of a word may enhance parsing accuracy
Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123
Tree-stack LSTM Overview
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
We propose Tree-stack LSTM model with 4 components
β-LSTMσ-LSTMAction-LSTMTree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123
Tree-stack LSTM
Input Representation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 31 123
3 Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and
Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using ContextEmbeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123
a Language Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123
Language Model (LM)
LM is used to obtain Context and Word embeddings with twocomponents
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123
Language Model - Word vectors
Character based LSTM generates word Vectors
Figure Character LSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123
Language Model - Context Vectors
Word based BiLSTM generates Context Vectors
Figure Word BiLSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123
b MLP Parser (CoNLL17)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123
MLP Parser
MLP Parser consists of 4 components
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Feature extractor describes current state
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123
MLP Parser - Feature Extraction
Feature extractor describes current state
Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123
MLP Parser - Decision Module
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123
Experiments amp Dataset (MLP) CoNLL17
CoNLL17 Dataset
Dependency parsing of 81 treebanks in 49 languages
All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123
Experiments - Evaluation Metric
Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label
Economic news hadGold Tree LAS 1
SBJATT
Economic news had
PREDOBJPred 1 LAS 0
Economic news had
OBJATTPred 2 LAS (frac12)100
Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123
Experiments (MLP)
CoNLL 2017 Results (all treebanks LAS)
Ranked 1st among transition based parsers 5
5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123
Contributions in CoNLL17
Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Context vectors provide independent contribution on top ofPOS tags
Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677p-fb 747 797 663
p-v-c 793 832 742
Our BiLSTM language model word vectors perform betterthan FB vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Both POS tags and context vectors have significantcontributions on top of word vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123
Issues with MLP
However
Choosing correct state of parser still remains critical
We are unable to represent whole parsing history with featureextracting
Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123
Solution
Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack
Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123
c Tree-stack LSTM Parser (CoNLL18)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123
Related Work - Stack LSTM
Figure Stack LSTM [Dyer et al 2015]
Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123
Problems with Stack LSTM
They only modify stackrsquos word embeddings
Hidden states of LSTMS are not updated unless reduce
Actions are not explicitly represented
They only used word2vec embeddings [Mikolov et al 2013]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123
Our solution
We propose
Context embeddings should improve parsing accuracy
Dependency relations should be explicitly represented
Morphological Features of a word may enhance parsing accuracy
Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123
Tree-stack LSTM Overview
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
We propose Tree-stack LSTM model with 4 components
β-LSTMσ-LSTMAction-LSTMTree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123
Tree-stack LSTM
Input Representation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
3 Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 32 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and
Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using ContextEmbeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123
a Language Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123
Language Model (LM)
LM is used to obtain Context and Word embeddings with twocomponents
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123
Language Model - Word vectors
Character based LSTM generates word Vectors
Figure Character LSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123
Language Model - Context Vectors
Word based BiLSTM generates Context Vectors
Figure Word BiLSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123
b MLP Parser (CoNLL17)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123
MLP Parser
MLP Parser consists of 4 components
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Feature extractor describes current state
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123
MLP Parser - Feature Extraction
Feature extractor describes current state
Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123
MLP Parser - Decision Module
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123
Experiments amp Dataset (MLP) CoNLL17
CoNLL17 Dataset
Dependency parsing of 81 treebanks in 49 languages
All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123
Experiments - Evaluation Metric
Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label
Economic news hadGold Tree LAS 1
SBJATT
Economic news had
PREDOBJPred 1 LAS 0
Economic news had
OBJATTPred 2 LAS (frac12)100
Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123
Experiments (MLP)
CoNLL 2017 Results (all treebanks LAS)
Ranked 1st among transition based parsers 5
5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123
Contributions in CoNLL17
Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Context vectors provide independent contribution on top ofPOS tags
Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677p-fb 747 797 663
p-v-c 793 832 742
Our BiLSTM language model word vectors perform betterthan FB vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Both POS tags and context vectors have significantcontributions on top of word vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123
Issues with MLP
However
Choosing correct state of parser still remains critical
We are unable to represent whole parsing history with featureextracting
Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123
Solution
Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack
Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123
c Tree-stack LSTM Parser (CoNLL18)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123
Related Work - Stack LSTM
Figure Stack LSTM [Dyer et al 2015]
Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123
Problems with Stack LSTM
They only modify stackrsquos word embeddings
Hidden states of LSTMS are not updated unless reduce
Actions are not explicitly represented
They only used word2vec embeddings [Mikolov et al 2013]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123
Our solution
We propose
Context embeddings should improve parsing accuracy
Dependency relations should be explicitly represented
Morphological Features of a word may enhance parsing accuracy
Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123
Tree-stack LSTM Overview
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
We propose Tree-stack LSTM model with 4 components
β-LSTMσ-LSTMAction-LSTMTree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123
Tree-stack LSTM
Input Representation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18bull KParse team with Tree-stack LSTM Parser using Context and
Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 33 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using ContextEmbeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123
a Language Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123
Language Model (LM)
LM is used to obtain Context and Word embeddings with twocomponents
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123
Language Model - Word vectors
Character based LSTM generates word Vectors
Figure Character LSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123
Language Model - Context Vectors
Word based BiLSTM generates Context Vectors
Figure Word BiLSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123
b MLP Parser (CoNLL17)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123
MLP Parser
MLP Parser consists of 4 components
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Feature extractor describes current state
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123
MLP Parser - Feature Extraction
Feature extractor describes current state
Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123
MLP Parser - Decision Module
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123
Experiments amp Dataset (MLP) CoNLL17
CoNLL17 Dataset
Dependency parsing of 81 treebanks in 49 languages
All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123
Experiments - Evaluation Metric
Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label
Economic news hadGold Tree LAS 1
SBJATT
Economic news had
PREDOBJPred 1 LAS 0
Economic news had
OBJATTPred 2 LAS (frac12)100
Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123
Experiments (MLP)
CoNLL 2017 Results (all treebanks LAS)
Ranked 1st among transition based parsers 5
5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123
Contributions in CoNLL17
Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Context vectors provide independent contribution on top ofPOS tags
Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677p-fb 747 797 663
p-v-c 793 832 742
Our BiLSTM language model word vectors perform betterthan FB vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Both POS tags and context vectors have significantcontributions on top of word vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123
Issues with MLP
However
Choosing correct state of parser still remains critical
We are unable to represent whole parsing history with featureextracting
Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123
Solution
Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack
Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123
c Tree-stack LSTM Parser (CoNLL18)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123
Related Work - Stack LSTM
Figure Stack LSTM [Dyer et al 2015]
Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123
Problems with Stack LSTM
They only modify stackrsquos word embeddings
Hidden states of LSTMS are not updated unless reduce
Actions are not explicitly represented
They only used word2vec embeddings [Mikolov et al 2013]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123
Our solution
We propose
Context embeddings should improve parsing accuracy
Dependency relations should be explicitly represented
Morphological Features of a word may enhance parsing accuracy
Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123
Tree-stack LSTM Overview
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
We propose Tree-stack LSTM model with 4 components
β-LSTMσ-LSTMAction-LSTMTree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123
Tree-stack LSTM
Input Representation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using ContextEmbeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser using Context and Morph-featEmbeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 34 123
a Language Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123
Language Model (LM)
LM is used to obtain Context and Word embeddings with twocomponents
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123
Language Model - Word vectors
Character based LSTM generates word Vectors
Figure Character LSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123
Language Model - Context Vectors
Word based BiLSTM generates Context Vectors
Figure Word BiLSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123
b MLP Parser (CoNLL17)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123
MLP Parser
MLP Parser consists of 4 components
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Feature extractor describes current state
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123
MLP Parser - Feature Extraction
Feature extractor describes current state
Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123
MLP Parser - Decision Module
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123
Experiments amp Dataset (MLP) CoNLL17
CoNLL17 Dataset
Dependency parsing of 81 treebanks in 49 languages
All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123
Experiments - Evaluation Metric
Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label
Economic news hadGold Tree LAS 1
SBJATT
Economic news had
PREDOBJPred 1 LAS 0
Economic news had
OBJATTPred 2 LAS (frac12)100
Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123
Experiments (MLP)
CoNLL 2017 Results (all treebanks LAS)
Ranked 1st among transition based parsers 5
5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123
Contributions in CoNLL17
Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Context vectors provide independent contribution on top ofPOS tags
Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677p-fb 747 797 663
p-v-c 793 832 742
Our BiLSTM language model word vectors perform betterthan FB vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Both POS tags and context vectors have significantcontributions on top of word vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123
Issues with MLP
However
Choosing correct state of parser still remains critical
We are unable to represent whole parsing history with featureextracting
Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123
Solution
Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack
Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123
c Tree-stack LSTM Parser (CoNLL18)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123
Related Work - Stack LSTM
Figure Stack LSTM [Dyer et al 2015]
Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123
Problems with Stack LSTM
They only modify stackrsquos word embeddings
Hidden states of LSTMS are not updated unless reduce
Actions are not explicitly represented
They only used word2vec embeddings [Mikolov et al 2013]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123
Our solution
We propose
Context embeddings should improve parsing accuracy
Dependency relations should be explicitly represented
Morphological Features of a word may enhance parsing accuracy
Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123
Tree-stack LSTM Overview
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
We propose Tree-stack LSTM model with 4 components
β-LSTMσ-LSTMAction-LSTMTree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123
Tree-stack LSTM
Input Representation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
a Language Model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 35 123
Language Model (LM)
LM is used to obtain Context and Word embeddings with twocomponents
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123
Language Model - Word vectors
Character based LSTM generates word Vectors
Figure Character LSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123
Language Model - Context Vectors
Word based BiLSTM generates Context Vectors
Figure Word BiLSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123
b MLP Parser (CoNLL17)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123
MLP Parser
MLP Parser consists of 4 components
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Feature extractor describes current state
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123
MLP Parser - Feature Extraction
Feature extractor describes current state
Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123
MLP Parser - Decision Module
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123
Experiments amp Dataset (MLP) CoNLL17
CoNLL17 Dataset
Dependency parsing of 81 treebanks in 49 languages
All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123
Experiments - Evaluation Metric
Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label
Economic news hadGold Tree LAS 1
SBJATT
Economic news had
PREDOBJPred 1 LAS 0
Economic news had
OBJATTPred 2 LAS (frac12)100
Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123
Experiments (MLP)
CoNLL 2017 Results (all treebanks LAS)
Ranked 1st among transition based parsers 5
5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123
Contributions in CoNLL17
Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Context vectors provide independent contribution on top ofPOS tags
Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677p-fb 747 797 663
p-v-c 793 832 742
Our BiLSTM language model word vectors perform betterthan FB vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Both POS tags and context vectors have significantcontributions on top of word vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123
Issues with MLP
However
Choosing correct state of parser still remains critical
We are unable to represent whole parsing history with featureextracting
Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123
Solution
Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack
Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123
c Tree-stack LSTM Parser (CoNLL18)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123
Related Work - Stack LSTM
Figure Stack LSTM [Dyer et al 2015]
Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123
Problems with Stack LSTM
They only modify stackrsquos word embeddings
Hidden states of LSTMS are not updated unless reduce
Actions are not explicitly represented
They only used word2vec embeddings [Mikolov et al 2013]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123
Our solution
We propose
Context embeddings should improve parsing accuracy
Dependency relations should be explicitly represented
Morphological Features of a word may enhance parsing accuracy
Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123
Tree-stack LSTM Overview
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
We propose Tree-stack LSTM model with 4 components
β-LSTMσ-LSTMAction-LSTMTree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123
Tree-stack LSTM
Input Representation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Language Model (LM)
LM is used to obtain Context and Word embeddings with twocomponents
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 36 123
Language Model - Word vectors
Character based LSTM generates word Vectors
Figure Character LSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123
Language Model - Context Vectors
Word based BiLSTM generates Context Vectors
Figure Word BiLSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123
b MLP Parser (CoNLL17)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123
MLP Parser
MLP Parser consists of 4 components
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Feature extractor describes current state
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123
MLP Parser - Feature Extraction
Feature extractor describes current state
Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123
MLP Parser - Decision Module
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123
Experiments amp Dataset (MLP) CoNLL17
CoNLL17 Dataset
Dependency parsing of 81 treebanks in 49 languages
All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123
Experiments - Evaluation Metric
Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label
Economic news hadGold Tree LAS 1
SBJATT
Economic news had
PREDOBJPred 1 LAS 0
Economic news had
OBJATTPred 2 LAS (frac12)100
Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123
Experiments (MLP)
CoNLL 2017 Results (all treebanks LAS)
Ranked 1st among transition based parsers 5
5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123
Contributions in CoNLL17
Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Context vectors provide independent contribution on top ofPOS tags
Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677p-fb 747 797 663
p-v-c 793 832 742
Our BiLSTM language model word vectors perform betterthan FB vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Both POS tags and context vectors have significantcontributions on top of word vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123
Issues with MLP
However
Choosing correct state of parser still remains critical
We are unable to represent whole parsing history with featureextracting
Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123
Solution
Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack
Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123
c Tree-stack LSTM Parser (CoNLL18)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123
Related Work - Stack LSTM
Figure Stack LSTM [Dyer et al 2015]
Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123
Problems with Stack LSTM
They only modify stackrsquos word embeddings
Hidden states of LSTMS are not updated unless reduce
Actions are not explicitly represented
They only used word2vec embeddings [Mikolov et al 2013]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123
Our solution
We propose
Context embeddings should improve parsing accuracy
Dependency relations should be explicitly represented
Morphological Features of a word may enhance parsing accuracy
Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123
Tree-stack LSTM Overview
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
We propose Tree-stack LSTM model with 4 components
β-LSTMσ-LSTMAction-LSTMTree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123
Tree-stack LSTM
Input Representation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Language Model - Word vectors
Character based LSTM generates word Vectors
Figure Character LSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 37 123
Language Model - Context Vectors
Word based BiLSTM generates Context Vectors
Figure Word BiLSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123
b MLP Parser (CoNLL17)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123
MLP Parser
MLP Parser consists of 4 components
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Feature extractor describes current state
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123
MLP Parser - Feature Extraction
Feature extractor describes current state
Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123
MLP Parser - Decision Module
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123
Experiments amp Dataset (MLP) CoNLL17
CoNLL17 Dataset
Dependency parsing of 81 treebanks in 49 languages
All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123
Experiments - Evaluation Metric
Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label
Economic news hadGold Tree LAS 1
SBJATT
Economic news had
PREDOBJPred 1 LAS 0
Economic news had
OBJATTPred 2 LAS (frac12)100
Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123
Experiments (MLP)
CoNLL 2017 Results (all treebanks LAS)
Ranked 1st among transition based parsers 5
5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123
Contributions in CoNLL17
Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Context vectors provide independent contribution on top ofPOS tags
Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677p-fb 747 797 663
p-v-c 793 832 742
Our BiLSTM language model word vectors perform betterthan FB vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Both POS tags and context vectors have significantcontributions on top of word vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123
Issues with MLP
However
Choosing correct state of parser still remains critical
We are unable to represent whole parsing history with featureextracting
Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123
Solution
Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack
Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123
c Tree-stack LSTM Parser (CoNLL18)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123
Related Work - Stack LSTM
Figure Stack LSTM [Dyer et al 2015]
Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123
Problems with Stack LSTM
They only modify stackrsquos word embeddings
Hidden states of LSTMS are not updated unless reduce
Actions are not explicitly represented
They only used word2vec embeddings [Mikolov et al 2013]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123
Our solution
We propose
Context embeddings should improve parsing accuracy
Dependency relations should be explicitly represented
Morphological Features of a word may enhance parsing accuracy
Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123
Tree-stack LSTM Overview
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
We propose Tree-stack LSTM model with 4 components
β-LSTMσ-LSTMAction-LSTMTree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123
Tree-stack LSTM
Input Representation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Language Model - Context Vectors
Word based BiLSTM generates Context Vectors
Figure Word BiLSTM from Kırnap et al 2017
Omer Kırnap (Koc University) MSc Thesis September 27 2018 38 123
b MLP Parser (CoNLL17)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123
MLP Parser
MLP Parser consists of 4 components
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Feature extractor describes current state
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123
MLP Parser - Feature Extraction
Feature extractor describes current state
Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123
MLP Parser - Decision Module
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123
Experiments amp Dataset (MLP) CoNLL17
CoNLL17 Dataset
Dependency parsing of 81 treebanks in 49 languages
All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123
Experiments - Evaluation Metric
Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label
Economic news hadGold Tree LAS 1
SBJATT
Economic news had
PREDOBJPred 1 LAS 0
Economic news had
OBJATTPred 2 LAS (frac12)100
Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123
Experiments (MLP)
CoNLL 2017 Results (all treebanks LAS)
Ranked 1st among transition based parsers 5
5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123
Contributions in CoNLL17
Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Context vectors provide independent contribution on top ofPOS tags
Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677p-fb 747 797 663
p-v-c 793 832 742
Our BiLSTM language model word vectors perform betterthan FB vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Both POS tags and context vectors have significantcontributions on top of word vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123
Issues with MLP
However
Choosing correct state of parser still remains critical
We are unable to represent whole parsing history with featureextracting
Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123
Solution
Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack
Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123
c Tree-stack LSTM Parser (CoNLL18)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123
Related Work - Stack LSTM
Figure Stack LSTM [Dyer et al 2015]
Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123
Problems with Stack LSTM
They only modify stackrsquos word embeddings
Hidden states of LSTMS are not updated unless reduce
Actions are not explicitly represented
They only used word2vec embeddings [Mikolov et al 2013]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123
Our solution
We propose
Context embeddings should improve parsing accuracy
Dependency relations should be explicitly represented
Morphological Features of a word may enhance parsing accuracy
Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123
Tree-stack LSTM Overview
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
We propose Tree-stack LSTM model with 4 components
β-LSTMσ-LSTMAction-LSTMTree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123
Tree-stack LSTM
Input Representation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
b MLP Parser (CoNLL17)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 39 123
MLP Parser
MLP Parser consists of 4 components
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Feature extractor describes current state
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123
MLP Parser - Feature Extraction
Feature extractor describes current state
Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123
MLP Parser - Decision Module
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123
Experiments amp Dataset (MLP) CoNLL17
CoNLL17 Dataset
Dependency parsing of 81 treebanks in 49 languages
All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123
Experiments - Evaluation Metric
Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label
Economic news hadGold Tree LAS 1
SBJATT
Economic news had
PREDOBJPred 1 LAS 0
Economic news had
OBJATTPred 2 LAS (frac12)100
Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123
Experiments (MLP)
CoNLL 2017 Results (all treebanks LAS)
Ranked 1st among transition based parsers 5
5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123
Contributions in CoNLL17
Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Context vectors provide independent contribution on top ofPOS tags
Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677p-fb 747 797 663
p-v-c 793 832 742
Our BiLSTM language model word vectors perform betterthan FB vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Both POS tags and context vectors have significantcontributions on top of word vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123
Issues with MLP
However
Choosing correct state of parser still remains critical
We are unable to represent whole parsing history with featureextracting
Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123
Solution
Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack
Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123
c Tree-stack LSTM Parser (CoNLL18)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123
Related Work - Stack LSTM
Figure Stack LSTM [Dyer et al 2015]
Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123
Problems with Stack LSTM
They only modify stackrsquos word embeddings
Hidden states of LSTMS are not updated unless reduce
Actions are not explicitly represented
They only used word2vec embeddings [Mikolov et al 2013]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123
Our solution
We propose
Context embeddings should improve parsing accuracy
Dependency relations should be explicitly represented
Morphological Features of a word may enhance parsing accuracy
Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123
Tree-stack LSTM Overview
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
We propose Tree-stack LSTM model with 4 components
β-LSTMσ-LSTMAction-LSTMTree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123
Tree-stack LSTM
Input Representation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
MLP Parser
MLP Parser consists of 4 components
Character Based LSTM extracts word vectors
Word Based BiLSTM extracts context vectors
Feature extractor describes current state
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 40 123
MLP Parser - Feature Extraction
Feature extractor describes current state
Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123
MLP Parser - Decision Module
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123
Experiments amp Dataset (MLP) CoNLL17
CoNLL17 Dataset
Dependency parsing of 81 treebanks in 49 languages
All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123
Experiments - Evaluation Metric
Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label
Economic news hadGold Tree LAS 1
SBJATT
Economic news had
PREDOBJPred 1 LAS 0
Economic news had
OBJATTPred 2 LAS (frac12)100
Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123
Experiments (MLP)
CoNLL 2017 Results (all treebanks LAS)
Ranked 1st among transition based parsers 5
5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123
Contributions in CoNLL17
Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Context vectors provide independent contribution on top ofPOS tags
Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677p-fb 747 797 663
p-v-c 793 832 742
Our BiLSTM language model word vectors perform betterthan FB vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Both POS tags and context vectors have significantcontributions on top of word vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123
Issues with MLP
However
Choosing correct state of parser still remains critical
We are unable to represent whole parsing history with featureextracting
Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123
Solution
Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack
Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123
c Tree-stack LSTM Parser (CoNLL18)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123
Related Work - Stack LSTM
Figure Stack LSTM [Dyer et al 2015]
Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123
Problems with Stack LSTM
They only modify stackrsquos word embeddings
Hidden states of LSTMS are not updated unless reduce
Actions are not explicitly represented
They only used word2vec embeddings [Mikolov et al 2013]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123
Our solution
We propose
Context embeddings should improve parsing accuracy
Dependency relations should be explicitly represented
Morphological Features of a word may enhance parsing accuracy
Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123
Tree-stack LSTM Overview
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
We propose Tree-stack LSTM model with 4 components
β-LSTMσ-LSTMAction-LSTMTree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123
Tree-stack LSTM
Input Representation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
MLP Parser - Feature Extraction
Feature extractor describes current state
Figure Kırnap et al 2017Omer Kırnap (Koc University) MSc Thesis September 27 2018 41 123
MLP Parser - Decision Module
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123
Experiments amp Dataset (MLP) CoNLL17
CoNLL17 Dataset
Dependency parsing of 81 treebanks in 49 languages
All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123
Experiments - Evaluation Metric
Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label
Economic news hadGold Tree LAS 1
SBJATT
Economic news had
PREDOBJPred 1 LAS 0
Economic news had
OBJATTPred 2 LAS (frac12)100
Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123
Experiments (MLP)
CoNLL 2017 Results (all treebanks LAS)
Ranked 1st among transition based parsers 5
5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123
Contributions in CoNLL17
Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Context vectors provide independent contribution on top ofPOS tags
Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677p-fb 747 797 663
p-v-c 793 832 742
Our BiLSTM language model word vectors perform betterthan FB vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Both POS tags and context vectors have significantcontributions on top of word vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123
Issues with MLP
However
Choosing correct state of parser still remains critical
We are unable to represent whole parsing history with featureextracting
Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123
Solution
Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack
Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123
c Tree-stack LSTM Parser (CoNLL18)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123
Related Work - Stack LSTM
Figure Stack LSTM [Dyer et al 2015]
Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123
Problems with Stack LSTM
They only modify stackrsquos word embeddings
Hidden states of LSTMS are not updated unless reduce
Actions are not explicitly represented
They only used word2vec embeddings [Mikolov et al 2013]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123
Our solution
We propose
Context embeddings should improve parsing accuracy
Dependency relations should be explicitly represented
Morphological Features of a word may enhance parsing accuracy
Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123
Tree-stack LSTM Overview
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
We propose Tree-stack LSTM model with 4 components
β-LSTMσ-LSTMAction-LSTMTree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123
Tree-stack LSTM
Input Representation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
MLP Parser - Decision Module
Decision module (MLP) decides the next transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 42 123
Experiments amp Dataset (MLP) CoNLL17
CoNLL17 Dataset
Dependency parsing of 81 treebanks in 49 languages
All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123
Experiments - Evaluation Metric
Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label
Economic news hadGold Tree LAS 1
SBJATT
Economic news had
PREDOBJPred 1 LAS 0
Economic news had
OBJATTPred 2 LAS (frac12)100
Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123
Experiments (MLP)
CoNLL 2017 Results (all treebanks LAS)
Ranked 1st among transition based parsers 5
5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123
Contributions in CoNLL17
Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Context vectors provide independent contribution on top ofPOS tags
Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677p-fb 747 797 663
p-v-c 793 832 742
Our BiLSTM language model word vectors perform betterthan FB vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Both POS tags and context vectors have significantcontributions on top of word vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123
Issues with MLP
However
Choosing correct state of parser still remains critical
We are unable to represent whole parsing history with featureextracting
Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123
Solution
Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack
Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123
c Tree-stack LSTM Parser (CoNLL18)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123
Related Work - Stack LSTM
Figure Stack LSTM [Dyer et al 2015]
Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123
Problems with Stack LSTM
They only modify stackrsquos word embeddings
Hidden states of LSTMS are not updated unless reduce
Actions are not explicitly represented
They only used word2vec embeddings [Mikolov et al 2013]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123
Our solution
We propose
Context embeddings should improve parsing accuracy
Dependency relations should be explicitly represented
Morphological Features of a word may enhance parsing accuracy
Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123
Tree-stack LSTM Overview
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
We propose Tree-stack LSTM model with 4 components
β-LSTMσ-LSTMAction-LSTMTree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123
Tree-stack LSTM
Input Representation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Experiments amp Dataset (MLP) CoNLL17
CoNLL17 Dataset
Dependency parsing of 81 treebanks in 49 languages
All treebanks use standardized annotationbull 17 universal part-of-speech tagsbull 37 universal dependency relations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 43 123
Experiments - Evaluation Metric
Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label
Economic news hadGold Tree LAS 1
SBJATT
Economic news had
PREDOBJPred 1 LAS 0
Economic news had
OBJATTPred 2 LAS (frac12)100
Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123
Experiments (MLP)
CoNLL 2017 Results (all treebanks LAS)
Ranked 1st among transition based parsers 5
5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123
Contributions in CoNLL17
Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Context vectors provide independent contribution on top ofPOS tags
Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677p-fb 747 797 663
p-v-c 793 832 742
Our BiLSTM language model word vectors perform betterthan FB vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Both POS tags and context vectors have significantcontributions on top of word vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123
Issues with MLP
However
Choosing correct state of parser still remains critical
We are unable to represent whole parsing history with featureextracting
Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123
Solution
Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack
Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123
c Tree-stack LSTM Parser (CoNLL18)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123
Related Work - Stack LSTM
Figure Stack LSTM [Dyer et al 2015]
Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123
Problems with Stack LSTM
They only modify stackrsquos word embeddings
Hidden states of LSTMS are not updated unless reduce
Actions are not explicitly represented
They only used word2vec embeddings [Mikolov et al 2013]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123
Our solution
We propose
Context embeddings should improve parsing accuracy
Dependency relations should be explicitly represented
Morphological Features of a word may enhance parsing accuracy
Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123
Tree-stack LSTM Overview
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
We propose Tree-stack LSTM model with 4 components
β-LSTMσ-LSTMAction-LSTMTree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123
Tree-stack LSTM
Input Representation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Experiments - Evaluation Metric
Labeled Attachment Score (LAS)The percentage of words correctly assigned both the correct syntactic headand the correct dependency label
Economic news hadGold Tree LAS 1
SBJATT
Economic news had
PREDOBJPred 1 LAS 0
Economic news had
OBJATTPred 2 LAS (frac12)100
Omer Kırnap (Koc University) MSc Thesis September 27 2018 44 123
Experiments (MLP)
CoNLL 2017 Results (all treebanks LAS)
Ranked 1st among transition based parsers 5
5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123
Contributions in CoNLL17
Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Context vectors provide independent contribution on top ofPOS tags
Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677p-fb 747 797 663
p-v-c 793 832 742
Our BiLSTM language model word vectors perform betterthan FB vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Both POS tags and context vectors have significantcontributions on top of word vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123
Issues with MLP
However
Choosing correct state of parser still remains critical
We are unable to represent whole parsing history with featureextracting
Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123
Solution
Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack
Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123
c Tree-stack LSTM Parser (CoNLL18)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123
Related Work - Stack LSTM
Figure Stack LSTM [Dyer et al 2015]
Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123
Problems with Stack LSTM
They only modify stackrsquos word embeddings
Hidden states of LSTMS are not updated unless reduce
Actions are not explicitly represented
They only used word2vec embeddings [Mikolov et al 2013]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123
Our solution
We propose
Context embeddings should improve parsing accuracy
Dependency relations should be explicitly represented
Morphological Features of a word may enhance parsing accuracy
Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123
Tree-stack LSTM Overview
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
We propose Tree-stack LSTM model with 4 components
β-LSTMσ-LSTMAction-LSTMTree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123
Tree-stack LSTM
Input Representation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Experiments (MLP)
CoNLL 2017 Results (all treebanks LAS)
Ranked 1st among transition based parsers 5
5Source CoNLL17 official results pageOmer Kırnap (Koc University) MSc Thesis September 27 2018 45 123
Contributions in CoNLL17
Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Context vectors provide independent contribution on top ofPOS tags
Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677p-fb 747 797 663
p-v-c 793 832 742
Our BiLSTM language model word vectors perform betterthan FB vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Both POS tags and context vectors have significantcontributions on top of word vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123
Issues with MLP
However
Choosing correct state of parser still remains critical
We are unable to represent whole parsing history with featureextracting
Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123
Solution
Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack
Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123
c Tree-stack LSTM Parser (CoNLL18)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123
Related Work - Stack LSTM
Figure Stack LSTM [Dyer et al 2015]
Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123
Problems with Stack LSTM
They only modify stackrsquos word embeddings
Hidden states of LSTMS are not updated unless reduce
Actions are not explicitly represented
They only used word2vec embeddings [Mikolov et al 2013]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123
Our solution
We propose
Context embeddings should improve parsing accuracy
Dependency relations should be explicitly represented
Morphological Features of a word may enhance parsing accuracy
Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123
Tree-stack LSTM Overview
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
We propose Tree-stack LSTM model with 4 components
β-LSTMσ-LSTMAction-LSTMTree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123
Tree-stack LSTM
Input Representation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Contributions in CoNLL17
Omer Kırnap (Koc University) MSc Thesis September 27 2018 46 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Context vectors provide independent contribution on top ofPOS tags
Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677p-fb 747 797 663
p-v-c 793 832 742
Our BiLSTM language model word vectors perform betterthan FB vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Both POS tags and context vectors have significantcontributions on top of word vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123
Issues with MLP
However
Choosing correct state of parser still remains critical
We are unable to represent whole parsing history with featureextracting
Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123
Solution
Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack
Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123
c Tree-stack LSTM Parser (CoNLL18)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123
Related Work - Stack LSTM
Figure Stack LSTM [Dyer et al 2015]
Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123
Problems with Stack LSTM
They only modify stackrsquos word embeddings
Hidden states of LSTMS are not updated unless reduce
Actions are not explicitly represented
They only used word2vec embeddings [Mikolov et al 2013]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123
Our solution
We propose
Context embeddings should improve parsing accuracy
Dependency relations should be explicitly represented
Morphological Features of a word may enhance parsing accuracy
Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123
Tree-stack LSTM Overview
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
We propose Tree-stack LSTM model with 4 components
β-LSTMσ-LSTMAction-LSTMTree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123
Tree-stack LSTM
Input Representation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Omer Kırnap (Koc University) MSc Thesis September 27 2018 47 123
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Context vectors provide independent contribution on top ofPOS tags
Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677p-fb 747 797 663
p-v-c 793 832 742
Our BiLSTM language model word vectors perform betterthan FB vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Both POS tags and context vectors have significantcontributions on top of word vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123
Issues with MLP
However
Choosing correct state of parser still remains critical
We are unable to represent whole parsing history with featureextracting
Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123
Solution
Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack
Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123
c Tree-stack LSTM Parser (CoNLL18)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123
Related Work - Stack LSTM
Figure Stack LSTM [Dyer et al 2015]
Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123
Problems with Stack LSTM
They only modify stackrsquos word embeddings
Hidden states of LSTMS are not updated unless reduce
Actions are not explicitly represented
They only used word2vec embeddings [Mikolov et al 2013]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123
Our solution
We propose
Context embeddings should improve parsing accuracy
Dependency relations should be explicitly represented
Morphological Features of a word may enhance parsing accuracy
Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123
Tree-stack LSTM Overview
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
We propose Tree-stack LSTM model with 4 components
β-LSTMσ-LSTMAction-LSTMTree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123
Tree-stack LSTM
Input Representation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Context and Word Embeddings
Relative contributions of part-of-speech (p) word vector (v)context vector (c)
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Context vectors provide independent contribution on top ofPOS tags
Omer Kırnap (Koc University) MSc Thesis September 27 2018 48 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677p-fb 747 797 663
p-v-c 793 832 742
Our BiLSTM language model word vectors perform betterthan FB vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Both POS tags and context vectors have significantcontributions on top of word vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123
Issues with MLP
However
Choosing correct state of parser still remains critical
We are unable to represent whole parsing history with featureextracting
Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123
Solution
Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack
Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123
c Tree-stack LSTM Parser (CoNLL18)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123
Related Work - Stack LSTM
Figure Stack LSTM [Dyer et al 2015]
Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123
Problems with Stack LSTM
They only modify stackrsquos word embeddings
Hidden states of LSTMS are not updated unless reduce
Actions are not explicitly represented
They only used word2vec embeddings [Mikolov et al 2013]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123
Our solution
We propose
Context embeddings should improve parsing accuracy
Dependency relations should be explicitly represented
Morphological Features of a word may enhance parsing accuracy
Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123
Tree-stack LSTM Overview
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
We propose Tree-stack LSTM model with 4 components
β-LSTMσ-LSTMAction-LSTMTree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123
Tree-stack LSTM
Input Representation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677p-fb 747 797 663
p-v-c 793 832 742
Our BiLSTM language model word vectors perform betterthan FB vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 49 123
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Both POS tags and context vectors have significantcontributions on top of word vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123
Issues with MLP
However
Choosing correct state of parser still remains critical
We are unable to represent whole parsing history with featureextracting
Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123
Solution
Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack
Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123
c Tree-stack LSTM Parser (CoNLL18)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123
Related Work - Stack LSTM
Figure Stack LSTM [Dyer et al 2015]
Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123
Problems with Stack LSTM
They only modify stackrsquos word embeddings
Hidden states of LSTMS are not updated unless reduce
Actions are not explicitly represented
They only used word2vec embeddings [Mikolov et al 2013]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123
Our solution
We propose
Context embeddings should improve parsing accuracy
Dependency relations should be explicitly represented
Morphological Features of a word may enhance parsing accuracy
Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123
Tree-stack LSTM Overview
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
We propose Tree-stack LSTM model with 4 components
β-LSTMσ-LSTMAction-LSTMTree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123
Tree-stack LSTM
Input Representation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Context and Word embeddings
Feats Hungarian En-ParTUT Latvianp 636 766 559
v 735 759 63
c 722 76 635
v-c 76 79 676
p-c 78 825 706
p-v 766 808 677
p-fb 747 797 663
p-v-c 793 832 742
Both POS tags and context vectors have significantcontributions on top of word vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 50 123
Issues with MLP
However
Choosing correct state of parser still remains critical
We are unable to represent whole parsing history with featureextracting
Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123
Solution
Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack
Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123
c Tree-stack LSTM Parser (CoNLL18)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123
Related Work - Stack LSTM
Figure Stack LSTM [Dyer et al 2015]
Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123
Problems with Stack LSTM
They only modify stackrsquos word embeddings
Hidden states of LSTMS are not updated unless reduce
Actions are not explicitly represented
They only used word2vec embeddings [Mikolov et al 2013]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123
Our solution
We propose
Context embeddings should improve parsing accuracy
Dependency relations should be explicitly represented
Morphological Features of a word may enhance parsing accuracy
Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123
Tree-stack LSTM Overview
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
We propose Tree-stack LSTM model with 4 components
β-LSTMσ-LSTMAction-LSTMTree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123
Tree-stack LSTM
Input Representation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Issues with MLP
However
Choosing correct state of parser still remains critical
We are unable to represent whole parsing history with featureextracting
Omer Kırnap (Koc University) MSc Thesis September 27 2018 51 123
Solution
Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack
Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123
c Tree-stack LSTM Parser (CoNLL18)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123
Related Work - Stack LSTM
Figure Stack LSTM [Dyer et al 2015]
Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123
Problems with Stack LSTM
They only modify stackrsquos word embeddings
Hidden states of LSTMS are not updated unless reduce
Actions are not explicitly represented
They only used word2vec embeddings [Mikolov et al 2013]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123
Our solution
We propose
Context embeddings should improve parsing accuracy
Dependency relations should be explicitly represented
Morphological Features of a word may enhance parsing accuracy
Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123
Tree-stack LSTM Overview
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
We propose Tree-stack LSTM model with 4 components
β-LSTMσ-LSTMAction-LSTMTree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123
Tree-stack LSTM
Input Representation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Solution
Find a recurrent architecture such that it can summarize the parsinghistory as well as word sequences in a buffer and stack
Omer Kırnap (Koc University) MSc Thesis September 27 2018 52 123
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123
c Tree-stack LSTM Parser (CoNLL18)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123
Related Work - Stack LSTM
Figure Stack LSTM [Dyer et al 2015]
Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123
Problems with Stack LSTM
They only modify stackrsquos word embeddings
Hidden states of LSTMS are not updated unless reduce
Actions are not explicitly represented
They only used word2vec embeddings [Mikolov et al 2013]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123
Our solution
We propose
Context embeddings should improve parsing accuracy
Dependency relations should be explicitly represented
Morphological Features of a word may enhance parsing accuracy
Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123
Tree-stack LSTM Overview
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
We propose Tree-stack LSTM model with 4 components
β-LSTMσ-LSTMAction-LSTMTree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123
Tree-stack LSTM
Input Representation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Model Overview
2 Shared Tasks for Multilingual Parsing from Raw Text to UniversalDependencies
CoNLL17
bull Koc-University team with MLP Parser using Context Embeddings
CoNLL18
bull KParse team with Tree-stack LSTM Parser usingContext and Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 53 123
c Tree-stack LSTM Parser (CoNLL18)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123
Related Work - Stack LSTM
Figure Stack LSTM [Dyer et al 2015]
Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123
Problems with Stack LSTM
They only modify stackrsquos word embeddings
Hidden states of LSTMS are not updated unless reduce
Actions are not explicitly represented
They only used word2vec embeddings [Mikolov et al 2013]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123
Our solution
We propose
Context embeddings should improve parsing accuracy
Dependency relations should be explicitly represented
Morphological Features of a word may enhance parsing accuracy
Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123
Tree-stack LSTM Overview
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
We propose Tree-stack LSTM model with 4 components
β-LSTMσ-LSTMAction-LSTMTree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123
Tree-stack LSTM
Input Representation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
c Tree-stack LSTM Parser (CoNLL18)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 54 123
Related Work - Stack LSTM
Figure Stack LSTM [Dyer et al 2015]
Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123
Problems with Stack LSTM
They only modify stackrsquos word embeddings
Hidden states of LSTMS are not updated unless reduce
Actions are not explicitly represented
They only used word2vec embeddings [Mikolov et al 2013]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123
Our solution
We propose
Context embeddings should improve parsing accuracy
Dependency relations should be explicitly represented
Morphological Features of a word may enhance parsing accuracy
Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123
Tree-stack LSTM Overview
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
We propose Tree-stack LSTM model with 4 components
β-LSTMσ-LSTMAction-LSTMTree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123
Tree-stack LSTM
Input Representation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Related Work - Stack LSTM
Figure Stack LSTM [Dyer et al 2015]
Represent each component (σ β A) with an LSTMModifying head wordrsquos embedding with dependent embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 55 123
Problems with Stack LSTM
They only modify stackrsquos word embeddings
Hidden states of LSTMS are not updated unless reduce
Actions are not explicitly represented
They only used word2vec embeddings [Mikolov et al 2013]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123
Our solution
We propose
Context embeddings should improve parsing accuracy
Dependency relations should be explicitly represented
Morphological Features of a word may enhance parsing accuracy
Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123
Tree-stack LSTM Overview
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
We propose Tree-stack LSTM model with 4 components
β-LSTMσ-LSTMAction-LSTMTree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123
Tree-stack LSTM
Input Representation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Problems with Stack LSTM
They only modify stackrsquos word embeddings
Hidden states of LSTMS are not updated unless reduce
Actions are not explicitly represented
They only used word2vec embeddings [Mikolov et al 2013]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 56 123
Our solution
We propose
Context embeddings should improve parsing accuracy
Dependency relations should be explicitly represented
Morphological Features of a word may enhance parsing accuracy
Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123
Tree-stack LSTM Overview
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
We propose Tree-stack LSTM model with 4 components
β-LSTMσ-LSTMAction-LSTMTree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123
Tree-stack LSTM
Input Representation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Our solution
We propose
Context embeddings should improve parsing accuracy
Dependency relations should be explicitly represented
Morphological Features of a word may enhance parsing accuracy
Omer Kırnap (Koc University) MSc Thesis September 27 2018 57 123
Tree-stack LSTM Overview
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
We propose Tree-stack LSTM model with 4 components
β-LSTMσ-LSTMAction-LSTMTree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123
Tree-stack LSTM
Input Representation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Tree-stack LSTM Overview
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
We propose Tree-stack LSTM model with 4 components
β-LSTMσ-LSTMAction-LSTMTree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 58 123
Tree-stack LSTM
Input Representation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Tree-stack LSTM
Input Representation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 59 123
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Input Representation
Action and Dependency Relation Embeddings
Every action is represented with continuous vector
Every dependency relation is represented with continuous vector
Omer Kırnap (Koc University) MSc Thesis September 27 2018 60 123
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Input Representation
We do not include explicit feature extractor We initiated wordrepresentation by concatenating
Character Based LSTMrsquos word vectors
Word Based BiLSTMrsquos context vectors
Part-of-speech (POS) vectors
Morph-feat vectors
Omer Kırnap (Koc University) MSc Thesis September 27 2018 61 123
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Input Representation
Morp-feat Vectors
Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs IT It
Figure Morph-feat Embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 62 123
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Tree-stack LSTM
Model Components1 β-LSTM2 σ-LSTM3 Action-LSTM4 Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 63 123
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
β-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 64 123
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
β-LSTM
LSTM LSTM LSTM
wi+2wi+1wi
Figure Bufferrsquos β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 65 123
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
σ-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 66 123
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
σ-LSTM
LSTM LSTM LSTM
si si+1 si+2
Figure Stackrsquos σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 67 123
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Action-LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 68 123
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Action-LSTM
LSTM LSTM LSTM
Figure Action-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 69 123
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
How do components of tree-stack LSTM are connected
Omer Kırnap (Koc University) MSc Thesis September 27 2018 70 123
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Tree-RNN
Omer Kırnap (Koc University) MSc Thesis September 27 2018 71 123
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Tree-RNN (t-RNN)
t-RNN
Dependent word
Dependency Relation
Head word
Figure t-RNN
whead new = tanh(Wrnn lowast [whead old dl wdep] + brnn) (1)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 72 123
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Tree-RNN with
1 Left Transition2 Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 73 123
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Left Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 74 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM LSTM
Left transition
t-RNN
Dependency Relation
HeadDependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 75 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
Figure Stackrsquos top LSTM is reducedOmer Kırnap (Koc University) MSc Thesis September 27 2018 76 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM
LSTM
LSTM
Left transition
t-RNN
Dependency Relation
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 77 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure β-LSTM recalculates its hidden based on new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 78 123
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Transitions - Left
leftd(σ|s b|βA) = (σ b|βA cup (b d s))
LSTM LSTM LSTM
Left transition
t-RNN New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 79 123
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Right Transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 80 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Each embedding initiated by concatenating POS language andmorph-feat embeddings
Omer Kırnap (Koc University) MSc Thesis September 27 2018 81 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
Figure Stackrsquos top LSTM is reduced
Omer Kırnap (Koc University) MSc Thesis September 27 2018 82 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM
LSTM
LSTM
t-RNN
Dependency Relation
Right Transition
Head
Dependent
New Head
Figure t-RNN calculates new head embedding
Omer Kırnap (Koc University) MSc Thesis September 27 2018 83 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure σ-LSTM recalculates its hidden from new input
Omer Kırnap (Koc University) MSc Thesis September 27 2018 84 123
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Transitions - Right
rightd(σ|s|t βA) = (σ|s βA cup (s d t))
LSTM LSTM LSTM
t-RNN
Right Transition
New Head
Figure Tree-stack LSTM is ready to give new transition
Omer Kırnap (Koc University) MSc Thesis September 27 2018 85 123
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Final overview of Tree-stack LSTM
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 86 123
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Overview
1 IntroductionOverview of Dependency ParsingTransition Based Dependency Parsing
2 Related WorkLinear Models and their DrawbacksNeural Network Models
3 ModelLanguage ModelMLP ParserTree-stack LSTM Parser
4 ResultsMLP vs Tree-stack LSTMMorphological Feature EmbeddingsStatic vs Dynamic Oracle TrainingTransfer Learning
5 Conclusion6 Future Work amp Discussions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 87 123
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
4 Results amp Comparisons
Omer Kırnap (Koc University) MSc Thesis September 27 2018 88 123
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Results amp Comparisons
Dataset
Dependency parsing of 81
treebanks in 49 languages
All treebanks use standardized
annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 7th out
of 33 participants (1st among
transition based parsers)
Dependency parsing of 82
treebanks in 57 languages
All treebanks use
standardized annotation
17 universal
part-of-speech tags
37 universal dependency
relations
Koc-University ranked 16th
out of 30 participants (2nd
among transition based
parsers)
CoNLL17 CoNLL181 Traintest split change 2 Annotation
Omer Kırnap (Koc University) MSc Thesis September 27 2018 89 123
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
MLP vs Tree-stack LSTM
CoNLL 2018 committee released comparison results of CoNLL17 andCoNLL18 systems tested under the same test sets
Omer Kırnap (Koc University) MSc Thesis September 27 2018 90 123
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
MLP vs Tree-stack LSTM
2 possible problems of official comparison
1 If the annotation of the tree bank is improved the older parser ishandicapped
2 If the training-test split has changed and the old training data arenow in test data the old parser is favored undeservedly
Omer Kırnap (Koc University) MSc Thesis September 27 2018 91 123
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
MLP vs Tree-stack LSTM
Experiments with the same train-test datasets to compare models
Lang Code MLP Tree-stackru taiga (10k) 5889 6055hu szeged (20k) 6621 6818tr imst (50k) 5678 5875ar padt (120k) 6783 6814en ewt (205k) 7487 7577cs cac (473k) 8339 8357
Tree-stack LSTM outperforms MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 92 123
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Ablation Analysis of Tree-stack LSTM
An evolution from MLP to Tree-stack LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 93 123
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
MLP Parser
MLP
Figure Initial model
Omer Kırnap (Koc University) MSc Thesis September 27 2018 94 123
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Only Action LSTM
LSTM LSTM
Figure Only action LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 95 123
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Only β-LSTM
LSTM LSTM LSTM
LSTM LSTM MLP
Figure Only β-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 96 123
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Only σ-LSTM
LSTM LSTM
LSTM LSTM MLP
Figure Only σ-LSTM
Omer Kırnap (Koc University) MSc Thesis September 27 2018 97 123
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Ablation Analysis Results
Lang Code MLP Only Action Only-β Only-σhu szeged 6621 6687 6694 6703sv lines 7112 7205 7217 7245tr imst 5712 5687 5702 5712ar padt 6783 6667 6689 6692
cs cac 8389 8223 8313 8317
en ewt 7554 7543 7556 7567
Table Comparison between MLP and rdquoOnlyrdquo models
Omer Kırnap (Koc University) MSc Thesis September 27 2018 98 123
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Ablation of t-RNN
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 99 123
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Ablation of t-RNN
Comparison of stack-LSTMs with and without t-RNN
Lang Code without t-RNN with t-RNNno nynorsklia (3k) 5178 5333ru taiga (11k) 5913 6055gl treegal (15k) 6976 7045hu szeged (20k) 6612 6818sv lines (49k) 7404 7546tr imst (50k) 5812 5875
ar padt (120k) 6804 6814
en ewt (204k) 7487 7577
cs cac (473k) 8289 8357
cs pdt (1M) 8117 81164
t-RNN provides comparative advantage for low-resourcelanguages
Omer Kırnap (Koc University) MSc Thesis September 27 2018 100 123
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Ablation Analysis
Overall results of ablation analysis
Lang MLP Only A Only-β Only-σ wot-RNN allhu szeged 6621 6687 6694 6703 6612 6818sv lines 7112 7205 7217 7404 7217 7546tr imst 5712 5687 5702 5712 5812 5875ar padt 6783 6667 6689 6692 6804 6814cs cac 8389 8223 8313 8317 8289 8357en ewt 7554 7543 7556 7567 7487 7577
Tree-stack LSTM beats other model variations
Omer Kırnap (Koc University) MSc Thesis September 27 2018 101 123
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Ablation Analysis
Conclusions of Ablation Experiments
t-RNNrsquos performance contribution increases when the training sizedecreases
σ-LSTM provides more useful information independent from datasetsize
Interconnecting modelrsquos component with t-RNN makes tree-stackLSTM more powerful for low-resource languages (ranked 10th of alland 2nd among transition based parsers)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 102 123
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
What does Morphological Feature Embedding provide
Omer Kırnap (Koc University) MSc Thesis September 27 2018 103 123
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Contribution of Morph-feat Embeddings
Experimental SettingsWe divide Conll18 UD dataset 22 into 4 parts based on number oftraining tokens for each language to better understand our contributions
Languages having less than 20k tokens
Languages having more than 20k less than 50k tokens
Languages having more than 50k less than 100k tokens
Languages having 100k tokens or more
Omer Kırnap (Koc University) MSc Thesis September 27 2018 104 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having less than 20k training tokens
Lang code Morph-Feats no Morph-Feats of tokensno nynorsklia 5113 5333 3583
ru taiga 5832 6055 10479
sme giella 5278 5339 16385
la perseus 4993 516 18184
ug udt 5278 5339 19262
sl sst 4672 4877 19473
hu szeged 6623 6818 20166
Not useful for languages having less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 105 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having tokens in between 50k and100k
Lang code Morph-Feats no Morph-Feats of tokenssv lines 7218 7481 48325
fr sequoia 8436 8217 50543
en gum 7644 7534 53686
ko gsd 7374 7254 56687
eu bdt 7455 7332 72974
nl lassymal 767 758 75134
gl ctg 7902 79018 79327
lv lvtb 7233 7224 80666
id gsd 7576 7397 97531
Beneficial for languages with 50k-100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 106 123
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Contribution of Morph-feat embeddings
Morp-feat experiments for languages having more than 100k trainingtokens
Lang code Morph-Feats no Morph-Feats of tokensfa seraji 8118 8112 121064
bg btb 8453 8455 124336
en ewt 7577 75682 204585
ar padt 6802 6814 223881
de gsd 7159 7132 263804
ca ancora 8589 85874 417587
es ancora 8499 8478 444617
cs cac 8357 8363 472608
cs pdt 8143 8212 1173282
Neutral for languages having more than 100k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 107 123
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Static vs Dynamic Oracle Training
Static oracle transitions using gold movesDynamic oracle transitions using predicted moves
In both cases logp of gold moves maximized
t-RNN
Head word
Dependent word Dependency Relation
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM A
Concat
MLP
Omer Kırnap (Koc University) MSc Thesis September 27 2018 108 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens less than 20k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 109 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens in between 20k and 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 110 123
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Static vs Dynamic Oracle Training
Figure Results are very close for training tokens more than 50k
Omer Kırnap (Koc University) MSc Thesis September 27 2018 111 123
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
How about languages with less than 20k training tokens
Omer Kırnap (Koc University) MSc Thesis September 27 2018 112 123
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Transfer Learning
There are 4 possible types of transfer learning1 Using very limited data to train LM for word and context vectors and
use them to train a parser from scratch2 Using Facebookrsquos word vectors to train a parser [Bojanowski et al
2017]3 Using my own word and context vectors trained with different
language but from the same language family4 Applying transfer learning with a pre-trained parser
Language (1) (2) (3) (4)af afribooms not provided 7546 7743 7812kk ktb 2019 2231 2196 2386bxr bdt 764 976 993 898
kmr mg 2012 2257 2278 2339
Table LAS values for strategies (1) (2) (3) and (4)
Omer Kırnap (Koc University) MSc Thesis September 27 2018 113 123
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Transfer Learning
Conclusions of Transfer Learning Experiments
Applying transfer learning with a pre-trained parser is the mostbeneficial
From scratch LM training does not bring useful word and contextvectors
Our word and context vectors are still more useful than Facebookrsquos[Bojanowski et al 2017]
Omer Kırnap (Koc University) MSc Thesis September 27 2018 114 123
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Projectivity
Transition Based Parser can only build projective trees 6
6Figure fromhttpstplingfiluuse sarakurser5LN455-2014lectures5LN455-F8pdf
Omer Kırnap (Koc University) MSc Thesis September 27 2018 115 123
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Projective vs Non-projective
We compared our model with the best model for different projectivityratios
Language Projectiviy Best (LAS) Our (LAS)
grc perseus 907 7939 5503 (20)
eu bdt 9513 8422 7413 (17)
hu szeged 978 8266 6818 (14)
da ddt 9826 8628 7640 (17)
en gum 996 8505 7644 (15)
gl treegal 100 7425 7045 (10)
gl ctg 100 8212 7945 (14)
Table Our models performance gap decreases as the projectivity ratio increases
7
7From official results page and our projectivity tableOmer Kırnap (Koc University) MSc Thesis September 27 2018 116 123
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Conclusions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 117 123
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Conclusion
In conclusionWe introduced ldquoContext Word and Morph-featrdquo embeddings and showedtheir contribution in transition based dependency parsing
Our Tree-stack LSTM outperformed MLP by removing hand-craftedfeature engineering
Tree-stack LSTM performed better with low resource languages
When the training dataset size increases tree-stack LSTM losses itsadvantage
Omer Kırnap (Koc University) MSc Thesis September 27 2018 118 123
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Future Research Direction
End-to-End Training
Systems that are jointly trained for tokenization morphological taggingand dependency parsing performed better Some are also jointly trained alanguage model together with pre-trained embeddings
Attention Mechanism
Applying attention in between σ-LSTM states or β-LSTM orAction-LSTM may bring performance improvement
Morphological Features
Finding different way to represent morphological features
Dynamic Oracle vs Beam Training
Although I tried both of them I could not obtain performanceimprovement There may be convergence problems with our loss functionand another losses (CRF) may solve this problem
Omer Kırnap (Koc University) MSc Thesis September 27 2018 119 123
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Publications
Omer Kırnap Erenay Dayanık and Deniz Yuret 2018 Tree-stackLSTM in Transition Based Dependency Parsing In Proceedings ofthe CoNLL 2018 Shared Task Multilingual Parsing from Raw Text toUniversal Dependencies
Omer Kırnap Berkay Furkan Onder and Deniz Yuret 2017 Parsingwith Context Embeddings In Proceedings of the CoNLL 2017 SharedTask Multilingual Parsing from Raw Text to Universal Dependencies
Omer Kırnap (Koc University) MSc Thesis September 27 2018 120 123
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
References
Marco Kuhlmann Carlos Gomez-Rodriguez and Giorgio Satta 2011Dynamic programming algorithms for transition-based dependencyparsers In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics Human LanguageTechnologies-Volume 1 Association for Computational Linguisticspages 673682
S Kbler R McDonald and J Nivre 2009 Dependency parsingMorgan amp Claypool US
Chris Dyer Miguel Ballesteros Wang Ling Austin Matthews andNoah A Smith 2015 Transition based dependency parsing withstack long-short term memory CoRR abs150508075
Omer Kırnap (Koc University) MSc Thesis September 27 2018 121 123
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Thank you for your attention
Omer Kırnap (Koc University) MSc Thesis September 27 2018 122 123
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-
Questions
Omer Kırnap (Koc University) MSc Thesis September 27 2018 123 123
- Introduction
-
- Overview of Dependency Parsing
- Transition Based Dependency Parsing
-
- Related Work
-
- Linear Models and their Drawbacks
- Neural Network Models
-
- Model
-
- Language Model
- MLP Parser
- Tree-stack LSTM Parser
-
- Results
-
- MLP vs Tree-stack LSTM
- Morphological Feature Embeddings
- Static vs Dynamic Oracle Training
- Transfer Learning
-
- Conclusion
- Future Work amp Discussions
-