A Generalized Parsing Framework Based On Computational...
Transcript of A Generalized Parsing Framework Based On Computational...
A Generalized Parsing Framework Based On
Computational Paninian Grammar
A thesis submitted to IIIT-Hyderabad
in partial fulfillment of the requirements
for the degree of
Doctor of Philosophy
in Computational Linguistics
Samar Husain
200522004
July, 2011
i
INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY
Hyderabad, India
Certificate
It is certified that the work contained in this thesis, titled ‘A Generalized Parsing Framework based on
Computational Paninian Grammar’ by SAMAR HUSAIN, has been carried out under my supervision and
is not submitted elsewhere for a degree.
Date:
Supervisors: Prof. Rajeev Sangal and Dr. Dipti Misra Sharma
Language Technologies Research Center,
IIIT, Hyderabad
Signed: _______________________
Signed: _______________________
ii
Acknowledgment
I would like to thank my supervisors Dr. Rajeev Sangal and Dr. Dipti Misra Sharma for their constant
support and guidance throughout this work. The work has benefited from the suggestions/criticisms of
anonymous reviewers at various conferences. I would also like to thank past and present research students
(especially, Bharat Ram Ambati, Phani Gadde, Meher Vijay and Pujitha Gade) at LTRC, IIIT-Hyderabad,
who have collaborated with me on different aspects of this work and are co-authors in various papers. I
have also benefitted from discussions with Sudheer Kolachina, Prashanth Mannem, Sriram Venkatapathy,
Joakim Nivre, Owen Rambow, and Rajesh Bhatt. Parts of this work have been presented at various
workshops such as ‘TCS NLP Winter School’ (Dec-Jan, 2007-08, IIIT-Hyderabad), ‘IASNLP08’ (May-
June 2008, IIIT-Hyderabad), ‘CGMIL’ (June 2008, IIIT-Hyderabad), ‘Dependency Parsing Workshop’
(June 2009, Univ. of Colorado, Boulder); discussions, comments and questions by participants at these
talks have also contributed to the improvement of the work.
Thanks are due to my parents for understanding the pressures of completing a PhD and for always being
extremely positive. Their prayers and love were instrumental in getting me through. Many thanks to
Arafat Ahsan, Monis Raja Khan, Ashwini Vaidya and Raina Khare for always being there for me.
Without their friendship and support it would not have been possible to persevere. Special thanks to Anil
Kumar Singh. He was my first mentor at LTRC and was instrumental in my decision to attempt for a PhD
and later for staying the course. Finally, thanks to all the writers/poets who came to rescue on those dark-
futile nights and taught me a thing or two about life.
iii
Abstract
In this work, I present a generalized dependency parsing scheme using the Computation Paninian
Grammar (CPG). This is done by incorporating the grammatical notions of CPG in both constraint based
parsing and data-driven parsing. These are reflected as design decisions, constraint formation, feature
selection, etc. in these parsers.
In the constraint based setup I extend an existing parsing paradigm for CPG to cover additional
language phenomenons. A layered parsing approach is motivated, in particular, two stages based on the
notion of clause is introduced in parsing. I show how different grammatical constructs are parsed at
appropriate stages. Use of different types of constraints such as licensing, eliminative and preferential are
introduced to help negotiate language variability with generic grammatical notions. This setup is then
integrated with the insights from graph-based dependency parsing and labeling for the task of
prioritization. This constraint based system (GH-CBP) is illustrated using Hindi, Telugu and Bangla. It
has been evaluated for Hindi and Telugu.
I then show how the insights gained from building GH-CBP can be applied to data driven approaches.
This is done by (a) incorporating targeted features during the training process, (b) by introducing
‘linguistically constrained’ modularity during the parsing process, and (c) by exploring ‘linguistically
rich’ graph-based parsing. I finally discuss the error analysis and make concluding observations.
iv
Contents
Chapter Page
1. Introduction ……………………………………………………………………………….. 1
1.1. Approach ……………………………………………………………………….. 1
1.2. Brief Outline …………………………………………………………………… 2
2. Dependency Grammar Formalism ……………………………………………….. 5
2.1. Some Dependency Grammar Formalisms ……………………………………… 7
2.1.1. Extensible Dependency Grammar (XDG) ………………………............ 8
2.1.2. Constraint Grammar (CG) ………………………………………............. 8
2.1.3. Functional Generative Description (FGD) ……………………………… 9
2.2. Computational Paninian Grammar (CPG) ………………………………........... 9
3. Dependency Parsing ……………………………………………………….............. 16
3.1. Constraint Based Parsing ……………………………………………………..... 17
3.2. Data-Driven Parsing …………………………………………………………..... 18
3.3. Constraint Based Parsing for Indian languages (CBP) ………………………… 19
4. A Two Stage Generalized Hybrid Constraint Based Parser (GH-CBP) ……….. 22
4.1. Parsing in layers ………………………………………………………………... 22
4.1.1. Chunk as minimal parsing unit ………………………………………...... 23
4.1.2. Clause as minimal parsing unit ……………………………….………..... 24
4.1.2.1.Two Stages ………………………………………………………….. 25
4.1.2.1.1. Status of _ROOT_ ………………………………………….. 29
4.1.2.1.2. Partial Parse …………………………………………………. 29
v
4.1.3. A layered architecture …………………………………………………... 30
4.2. Constraints ……………………………………………………………………... 32
4.2.1. Licensing Constraints ………………………………………………........ 32
4.2.1.1.H-constraints ………………………………………………............... 33
4.2.1.2.Meta-constraints …………………………………………………….. 34
4.2.1.2.1. Feature Unification ………………………………………….. 35
4.2.1.2.2. Demand Status Transformation ……………………….......... 38
4.2.1.2.3. Revision ……………………………………………………... 41
4.2.1.2.4. Look-ahead …………………………………………………. 44
4.2.2. Eliminative Constraints ………………………………………………..... 46
4.2.3. Preferential Constraints ………………………………………………..... 48
4.3. GH-CBP framework …………………………………………………………… 50
4.3.1. Parsing as Constraint Satisfaction ………………………………………. 51
4.3.2. Prioritization …………………………………………………………….. 52
4.3.3. Fail-safe Parse …………………………………………………………… 55
4.3.4. Algorithm ……………………………………………………………….. 56
4.4. Results ………………………………………………………………………….. 57
5. Incorporating Insights from GH-CBP in Data Driven Dependency Parsing ….. 60
5.1. Parsers: Malt and MST ……………………………………………………….... 61
5.2. Data …………………………………………………………………………...... 61
5.3. Incorporating targeted features during training ………………………………… 62
5.3.1. Morphological Features …………………………………………………. 62
5.3.2. Local Morphosyntactic Features ………………………………………… 63
5.3.3. Clausal Features …………………………………………………………. 64
5.3.4. Minimal Semantics Features ……………………………………………. 65
5.3.5. Results …………………………………………………………………… 66
5.4. Linguistically Constrained Modularity ………………………………………… 67
5.4.1. Chunk Based Parsing ………………………………………………….... 67
5.4.1.1. Chunk as Hard Constraint …………………………………………… 67
5.4.1.2. Chunk as Soft Constraint …………………………………………… 68
vi
5.4.1.3. Results ……………………………………………………………..... 69
5.4.2. Clausal Parsing ………………………………………………………….. 70
5.4.2.1. 2-stage parsing ……………………………………………………… 72
5.4.2.2. Two-Stage Parsing with Hard Constraints ………………………..... 73
5.4.2.2.1. Strategy 1 ………………………………………………….... 73
5.4.2.2.2. Strategy 2 ………………………………………………….... 76
5.4.2.2.3. Handling relative clause constructions in
2-Hard-S2 and 2-Hard-S1 ………………………………….. 79
5.4.2.3.Two-Stage Parsing with Soft Constraints …………………………… 79
5.4.2.4.Results ………………………………………………………………. 80
5.5. Linguistically Rich Graph-Based Parsing ……………………………………… 80
5.5.1. Constraint Graph ………………………………………………………… 81
5.5.2. Experimental Setup ……………………………………………………... 82
5.5.3. Experiments …………………………………………………………….. 82
5.5.4. Results …………………………………………………………………... 86
6. Rounding up ……………………………………………………………………….. 87
6.1. GH-CBP ……………………………………………………………………….. 87
6.1.1. Errors ……………………………………………………………………. 88
6.1.1.1.Error analysis of Prioritization …………………………………….... 90
6.2. Data driven parsing …………………………………………………………….. 91
6.2.1. Use of Targeted Features ……………………………………………….. 91
6.2.2. Chunk Based Parsing ………………………………………………….... 95
6.2.3. Clause Based Parsing ………………………………………………….... 96
6.2.4. Errors …………………………………………………………………… 100
6.2.5. Causes of Errors ………………………………………………………... 100
6.3. Linguistically Rich Graph-Based Parsing …………………………………….. 101
6.4. General Observations …………………………………………………………. 102
7. Conclusion ………………………………………………………………………… 105
vii
APPENDIX I : Dependency Tagset ……………………………………………. 106
APPENDIX II : Chunk Tagset ………………………………………………….. 109
APPENDIX III : POS Tagset ……………………………………………………... 110
APPENDIX IV : MaltParser Features ………………………………………….... 111
APPENDIX V : MaltParser Features (2nd
stage parser) ………………………. 113
APPENDIX VI : MSTParser Features ………………………………………….. 114
APPENDIX VII : MaxEnt labeler Features ……………………………………… 115
Bibliography ………………………………………………………………………….. 116
viii
List of Figures
Figure Page
1.1 Paring in layers .……………………………………………………………………. 3
2.1a Phrase Structure ……………………………………………………………………. 5
2.1b Dependency Structure ……………………………………………………………... 6
2.2 Levels of representation/analysis in the Computation Paninian Grammar ……….. 10
2.3 CPG dependency analysis of sentence 2.4 ………………………………………… 14
2.4 CPG dependency analysis of sentence 2.5 ………………………………………… 14
2.5 CPG dependency analysis of sentence 2.6 ………………………………………… 15
3.1 Constraint Graph for sentence 3.1 …………………………………………………. 20
3.2 Solution parses for sentence 3.1 ……………………………………………………. 21
4.1a Example 4.1 with POS tags ………………………………………………………... 24
4.1b Example 4.1 with Chunk boundaries ………………………………………………. 24
4.1c Example 4.1 with chunk head and vibhakti features ………………………………. 24
4.2 Chunk heads as dependency tree nodes …………………………………………… 24
4.3a 1st stage output for example 4.1 ……………………………………………………. 26
4.3b 2nd
stage final parse for example 4.1 ……………………………………………….. 26
4.4 Parse outputs for sentence 4.3 ……………………………………………………... 26
4.5 Stage 1 and Stage 2 outputs for sentence 4.3 ……………………………………… 27
4.6 Some inter-clausal structures ………………………………………………………. 28
4.7 Partial parse for sentence 4.5 ..................................................................................... 30
4.8a POS tagged and chunked sentence ………………………………………………… 31
4.8b Partial parse tree after stage 1 of GH-CBP ………………………………………… 31
4.8c Parse tree after stage 2 of GH-CBP ………………………………………………... 31
4.8d Complete parse after intra-chunk dependencies identification ……………………. 31
ix
4.9a Constraint Graph …………………………………………………………………... 32
4.9b Constraint Network ………………………………………………………………... 32
4.9c 2nd
stage CG for sentence 4.7 ……………………………………………………… 33
4.10 CG for example 4.8 …………………………………………………………………37
4.11 CG for example 4.9 ……………………………………………………………….. 37
4.12 Dependency tree for example 4.15 ……………………………………………….. 41
4.13 CG for example 4.15 ……………………………………………………………… 41
4.14 Variable property of coordinating conjunction …………………………………….. 42
4.15 Revision of CG …………………………………………………………………….. 42
4.16 Revision of CG with labels ………………………………………………………… 43
4.17 Revision of CG for example 4.15 ………………………………………………….. 43
4.18 2nd
stage CG for example 4.16 …………………………………………………….. 44
4.19 Look-ahead constraint applied to sentence 4.17 …………………………………… 45
4.20 Look-ahead constraint applied to sentence 4.19 …………………………………… 46
4.21 CG for example 4.19 ……………………………………………………………….. 47
4.22 Possible wrong trees for example 4.19 …………………………………………… 48
4.23 Solution parses for example 4.19 …………………………………………………... 48
4.24 Prioritizing of example 4.19 solution parses ……………………………………….. 49
4.25 Schematic design of GH-CBP ……………………………………………………... 51
4.26 Context over which S-constraints can be specified …………………………………54
4.27 Failsafe parse for example 4.20 ……………………………………………………. 56
5.1 Improvement of different features over MST baseline …………………………… 66
5.2 Chunk as Hard Constraint ………………………………………………………….. 68
5.3 Chunk as Soft Constraint ………………………………………………………….. 69
5.4 Dependency label distribution …………………………………………………….. 70
5.5 Arc length and relation type …………………………………………………………71
5.6 Depth and relation type …………………………………………………………….. 71
5.7a Original Gold input ………………………………………………………………… 73
5.7b 1st stage converted tree …………………………………………………………….. 73
5.8 Stage2 training input. Partial trees converted into a single node …………………... 74
x
5.9 Strategy I (2-Hard-S1) ………………………………………………………………75
5.10 Strategy II (2-Hard-S2).. Input to 2nd
stage is a partial parse ……………………… 76
5.11 2nd
stage initialization using the 1st stage parse shown in Fig. 4a …………………. 77
5.12 Parse output for sentence 5.5 ………………………………………………………. 77
5.13 Parse output and 2nd
stage initialization for sentence 5.6 ………………………….. 78
5.14 Constraint graph for sentence 5.7 ………………………………………………….. 82
5.15 Unlabeled attachment accuracies ….………………………………………………….. 86
6.1 Some intra-clausal non-projective structures ……………………………………… 88
6.2 Improvement of different features over MST baseline …………………………….. 92
6.3 Effect of clausal feature on arc length in MSTParser ……………………………… 94
6.4 LAS at arc-length (1-10) for Baseline, 2-Soft and 2-Hard ....................................... 98
6.5 LAS at depth (1-7) for Baseline, 2-Soft and 2-Hard ................................................ 99
xi
List of Tables
Table Page
2.1 Some salient properties of CPG …………………………………………………… 11
3.1 Basic karaka frame for khaa ‘eat’ …………………………………………………. 19
4.1 Division of relations in the two stages ……………………………………………. 28
4.2 Basic demand frame for Hindi verb de ‘give’ ……………………………………. 33
4.3 Basic demand frame for Telugu verb tin ‘eat’ …………………………………….. 33
4.4 Hindi passive TAM transformation frame …………………………………………. 34
4.5 Transformation frame for kara ……………………………………………………………. 34
4.6 Agreement pattern: Simple declarative TAM …………………………………….... 35
4.7 Agreement pattern: Inabilitative TAM …………………………………………….. 35
4.8 Agreement pattern: Obligational TAM ……………………………………………. 36
4.9 Agreement pattern: Perfective TAM ……………………………………………… 36
4.10 Basic demand frame for Bangla verb khaa ‘eat’ ………………………………….. 36
4.11 Final frame for Telugu verb tin ‘eat’ with inablitative TAM ……………………... 37
4.12 Final demand frame for de ‘give’ after passive transformation ………………….. 39
4.13 Final demand frame for de ‘give’ after kara transformation ……………………… 39
4.14 Transformation frame for te_holo ………………………………………………………. 39
4.15 Final demand frame for Bangla verb khaa ‘eat’ after te_holo transformation ……. 40
4.16 Transformation frame for Telugu TAM tu ……………………………………………... 40
4.17 Final demand frame for Telugu verb tinta ‘eat’ after tu transformation ………….. 40
4.18 Demand frame for subordinating conjuncts ……………………………………… 44
4.19 Grammatical notions handled via Meta-constraints ……………………………… 46
4.20 Basic demand frame for khaa ‘eat’ ……………………………………………….. 47
xii
4.21 Oracle scores with GH-CBP for unprioritized parses …………………………… 58
4.22 Intra-clausal and inter-clausal relation results …………………………………… 58
4.23 Results after various prioritization strategies …………………………..………… 59
5.1 Results for chunk modularity ……………………………………………………... 69
5.2 Overall parsing accuracy …………………………………………………………. 80
5.3 Experiment 1 valid arcs ……………………………………..……………………. 83
5.4 Experiment 2 and 3 valid arcs …………………………………………………… 84
5.5 Experiment 4 valid arcs …………….……………………………………………. 84
5.6 Experiment 5 and 6 valid arcs ……………………………………………………… 85
6.1 Unseen verbs and common argument structure errors ……………………………. 89
6.2 Intra-clausal performance in Hindi ………………………………….……………. 89
6.3 MaxEnt performance for attachment and label identification …….……………… 91
6.4 Most frequent confusions …………………………………………………………. 93
6.5 Effect of clausal features on parser performance …………………………………. 94
6.6 Effect of minimal semantics on some relations …………………………………… 95
6.7 Precision values for UAS with respect to arc length ……………………………….96
6.8 Accuracy for intra- and inter-clausal dependency relations ………………………. 97
6.9 Accuracy for relative clause construction …………………………………………. 97
6.10 Label identification comparison between Baseline and 2-Hard ………………….. 97
6.11 Advantages of different features/methods ………………………………………… 99
6.12 Comparison of baseline and Experiment 8 accuracies ............................................ 101
6.13 Accuracy distribution over POS ............................................................................. 102
xiii
1
Chapter 1
1. Introduction
Parsing of natural language text has been explored extensively since the 1990s. Most of the early
parsers were mostly tried for English or other fixed word order languages. Since the past decade
parsing of languages other than English has been taken up by the wider CL/NLP research
community. Parsing these morphologically rich, free word order (MoR-FWO) languages is a
challenging task. Challenges arise due to the non-configurational nature of MoR-FWO languages.
Non-configurationality leads to the complex and distributed nature of syntactic cues necessary to
identify various syntactic relations. There has been a recent surge in addressing parsing in such
languages (for example, Czech, Turkish, Hindi, etc.) (Nivre et al., 2007b; Hall et al., 2007; Nivre
and McDonald, 2008; Nivre, 2009; Tsarfaty and Sima'an, 2008; Seddah et al., 2009; Gadde et al.,
2010; Husain et al., 2009; Eryigit et al., 2008; Goldberg and Elhedad, 2009; Tsafarty et al., 2010;
Mannem et al., 2009b). But inspite of this, parsing accuracies for these languages are still less
when compared to accuracies of a fixed word order language like English.
Constraint based parsers have been known to have the power to capture non-trivial language
generalizations and are well suited for handling the complex phenomenons found in MoR-FWO
languages. On the downside, they suffer from robustness, ambiguity resolution and efficiency
issues (Nivre, 2005b). Data driven parsers, on the other hand, are efficient, deterministic and
quite robust. However, they are generally not good at capturing the complex linguistic
generalizations necessary to parse MoR-FWO languages. In this work, we explore both these
methods and investigate how insights from Computational Paninian grammar (CPG) can inform
both constraint based and data-driven parsing in improving their accuracies and also how these
parsing paradigms can benefit from one another.
1.1 Approach
Our approach to parsing can be stated as the following
2
a) A constraint based approach that incorporates grammatical notions from CPG is used to
build a generalized parser. This parser extends existing constraint based parsing paradigm
for CPG to cover additional language phenomenons. In particular, it introduces two
stages and generalized constraints in parsing. Hindi, Telugu and Bangla have been used
to illustrate the generalized parser. The implemented parser has been tested for Hindi and
Telugu.
b) Insights gained from building the above constraint parser is used in data-driven
dependency parsing.
c) Insights from graph-based dependency parsing and labeling are used to rank the parses
obtained from the constraint parser mentioned in (a).
1.2 Brief outline
In this work we begin by motivating the use of Computational Paninian Grammar (CPG) for
Indian language (IL) dependency parsing. We then discuss important grammatical notions in
CPG. These notions have direct bearing in the design of the proposed parsing framework. These
notions are:
a) Aakaankshaa ‘requirement of a head’
b) Karaka ‘participants in an action’
c) Abhihita ‘marked by the verb’
d) Vibhakti ‘nominal and verbal inflections’
e) Tiganta ‘finiteness’
The proposed generalized hybrid constraint based parser (GH-CBP) uses linguistically
motivated modularity, functional constraints and insights from graph-based parsing and labeling.
Linguistic modularity comes by treating chunks and clauses as minimal parsing units. This leads
to a layered parsing scheme where the parsing task is modularized. Figure 1 succinctly shows
these layers.
3
Figure 1.1. Parsing in layers
(a) POS tagged and chunked sentence,
(b) partial parse tree after stage 1 of GH-CBP (c) parse tree after stage 2 of GH-CBP,
(d) complete parse after intra-chunk dependencies identification
The role of constraint is central to GH-CBP. There are three types of constraints:
a) Licensing constraints (CL)
b) Eliminative constraints (CE)
c) Preferential constraints (CP)
These constraints have distinct functions and are incorporated during the parsing process in
different ways. CL is incorporated as a constraint graph (CG) for a input sentence. A CG
constrains the space of permissible dependency structures that the parser will eventually explore
in order to get a solution. CE are incorporated via integer programming and provide us with the
solution parses. Cp, on the other hand, are used to prioritize the parses and select the best parse.
These constraints get reflected as features that are used in a labeling classification model.
4
The task of labeling itself, along with some notions from graph based parsing, is used to prioritize
the multiple parses. GH-CBP is intended to allow generic, transparent and principled handling of
various syntactic constructions in Indian languages. We illustrate the framework using three
Indian languages (IL), namely, Hindi, Bangla and Telugu. It has been tested for Hindi and
Telugu.
After describing GH-CBP we then try and apply these insights to data driven approaches. This is
done by,
a) Incorporating targeted features during the training process: We systematically explore
which features are crucial during data-driven dependency parsing. Morphological, local
morphosyntactic, clausal and semantic features are tried. Their effectiveness is analyzed and
the best set of features is narrowed down.
b) Introducing ‘linguistically constrained’ modularity during the parsing process: The notions
of chunk and clause that were used in GH-CBP are now used to investigate if they can help in
improving the parsing accuracies. We also investigate what is the most optimal strategy to
incorporate this modularity during the parsing process.
c) Exploring ‘linguistically rich’ graph-based parsing: We integrate a graph based parsing
method with the licensing constraints used in GH-CBP. We investigate if parsing accuracy of
a graph based data driven parser can be improved by providing it a constraint graph instead of
a complete graph during the derivation step. Through a series of experiments we formulate
the most optimal constraint graph that gives us the best accuracy.
Finally we discuss the error analysis of all the approaches. Results show that we are for most part
able to significantly improve the baseline performance. The error analysis points to patterns that
exist in parsing languages such as Hindi and Telugu. The experiments flesh out the parsing
complexity of different language phenomenon.
5
Chapter 2
2. Dependency Grammar Formalism
It has been suggested that MoR-FWO languages can be handled more efficiently using the
dependency based framework than the constituency based one (Hudson, 1984; Shieber, 1985;
Mel'Cuk, 1988; Bharati et al., 1995a). The use of dependency formalism for various NLP/CL
tasks, but especially for parsing (Nivre et al., 2007b) has increased many folds since the last
decade. Consequently, most of the parsers for MoR-FWO languages are dependency based. The
basic difference between a constituent based representation and a dependency representation is
the lack of non-terminal nodes in the latter. Figure 2.1 (a) and (b) respectively shows a simplified
phrase structure and dependency structure for example 2.1.
(2.1) Abhay ate a mango.
Figure 2.1(a). Phrase Structure
6
Figure 2.1(b). Dependency Structure
Formally, for an input S, and a relation set R, a dependency tree is a well-formed labeled
digraph G = (V, A) that is a directed tree originating out of node w0 and has the spanning set V =
VS .
where, S = w0, w1, ….wn , is the set of all words in a sentence
R = {r1, …. rm} is a finite set of possible dependency relation types that can hold
between any two words in a sentence
V ⊆ {w0, w1, ….wn}
VS = {w0, w1, ….wn} is a spanning node set that contains all the nodes in a sentence
A ⊆ V x R x V
If (wi, r, wj) ∈ A, then (wi, r’, wj) ∉ A for all r’ ≠ r. This restriction if not followed will
amount to getting a multi-digraph. Multi-digraphs are the dependency representations
used in multi-stratal dependency theories.
Some of the properties of the dependency tree G1 are:
a) Root property: There does not exist wi ∈ V such that wi → w0 .
b) G always satisfies the spanning property over the words of the sentence, which states that
V = VS .
c) G always satisfies the connectedness property which states that for all wi, wj ∈ V, it is the
case that wi ↔* w0. That is, there exists a path between every word pair when the
direction of the arc is ignored.
1 The notations in this chapter and in the subsequent chapters have been adapted from Kubler et al. (2009).
7
d) G satisfies the single head property, which states that for all wi, wj ∈ V, if wi → wj then
there does not exist wi’ ∈ V such that i’ ≠ i and wi’ → wj .That is, each word in a
dependency tree is a dependent of at most one head.
e) G satisfies the acyclicity property, which states that for all wi, wj ∈ A, if wi → wj then it is
not the case that wj →* wi . That is, the dependency tree does not contain any cycles.
f) G satisfies the arc size property, which state that |A| = |V| - 1.
The properties associated with a dependency formalism such as those associated with
representation, level of analysis, status of word-order, etc. are not sacrosanct. Other than the fact
that most dependency formalism stipulate bi-directional asymmetrical relations between words in
a sentence, the formalisms can differ on myriad fronts. Some of the popular modern dependency
grammars known in the literature are the theory of structural syntax developed by Tesniere
(1959), Word Grammar (WG) (Hudson, 1984, 1990, 2007), Functional Generative Description
(FGD) (Sgall et al., 1986), Dependency Unification Grammar (DUG) (Hellwig, 1986, 2003),
Meaning-Text Theory (MTT) (Mel’cuk, 1988), and Lexicase (Starosta, 1988). In addition,
constraint-based theories of dependency grammar have been quite popular. Some of the popular
ones are Constraint Dependency Grammar (CDG) (Maruyama, 1990; Harper and Helzerman,
1995; Menzel and Schroder, 1998) and its descendant Weighted Constraint Dependency
Grammar (WCDG) (Schroder, 2002), Functional Dependency Grammar (FDG) (Tapanainen and
Jarvinen, 1997; Jarvinen and Tapanainen,1998), which in turn is inspired from Constraint
Grammar (CG) (Karlsson, 1990; Karlsson et al., 1995), and most recently Extensible Dependency
Grammar (XDG) (Debusmann et al., 2004; Duchier and Debusmann, 2001). A synthesis of
dependency grammar and categorial grammar is found in the framework of Dependency
Grammar Logic (DGL) (Kruijff, 2001).
2.1 Some Dependency Grammar Formalisms
Below we briefly discuss some dependency grammar formalisms.
8
2.1.1 Extensible Dependency Grammar (XDG)
Extensible Dependency Grammar (XDG) is a general framework for dependency grammar, with
multiple levels of linguistic representations called dimensions, e.g. grammatical function, word
order, predicate-argument structure, scope structure, information structure and prosodic structure.
It is articulated around a graph description language for multi-dimensional attributed labeled
graphs. An XDG grammar is a constraint that describes the valid linguistic signs as n-
dimensional attributed labeled graphs, i.e. n-tuples of graphs sharing the same set of attributed
nodes, but having different sets of labeled edges. All aspects of these signs are stipulated
explicitly by principles: the class of models for each dimension, additional properties that they
must satisfy, how one dimension must relate to another.
XDG models syntactic analysis as one of the many possible dimensions that are analyzed as a
lexicalized multi-dimensional configuration problem. It is inspired by Topological dependency
grammar (TDG) (Duchier and Debusmann, 2001) and is formulated as its generalization. Each of
these dimensions represents different linguistic description. Each lexical entry simultaneously
constrains all dimensions. XDG describes the well-formedness conditions of an analysis by the
interaction of principles and the lexicon. The principles stipulate restrictions on one or more of
the dimensions, and are controlled by the feature structures assigned to the nodes from the lexicon
(Debusmann et al., 2004).
2.1.2 Constraint Grammar (CG)
Constraint grammar (Karlsson, 1990, 1995) is a language independent formalism for surface-
oriented morphology based parsing of unrestricted text. All relevant structure in this grammar is
assigned from morphology to syntax. The constraints discard as many alternatives as possible, the
optimum being a fully disambiguated sentence with one label for each word, with the condition
that no genuine ambiguities should be obliterated. The goal of the grammar is to yield perfect
analysis. The other important aim of CG is to demonstrate that descriptively reasonable and
practically efficient parsing grammar can be designed that are based on pure surface
generalizations.
9
2.1.3 Functional Generative Description (FGD)
Functional Generative Description (FGD) (Sgall et al., 1986) is a dependency stratificational
grammar formalism that treats the sentence as a system of interlinked layers: phonological,
morphematical, morphonological, analytical (surface syntax) and tectogrammatical (deep syntax).
It not only specifies surface structures of the given sentences, but also translates them into their
underlying representations. These representations (called tectogrammatical representations,
denoted TRs) are intended as an appropriate input for a procedure of semantico-pragmatic
interpretation in the sense of intentional semantics (see Hajičová et al. 1998). Since TRs are, at
least in principle, disambiguated, it is possible to understand them as rendering linguistic (literal)
meaning as opposed to providing figurative meaning.
FGD forms the theoretical basis for the Prague dependency Treebank (PDT) (Hajičová,
2002). PDT has a three-level structure. Morphological layer at the lowest level. The middle
levels with superficial syntactic annotation using dependency syntax, it is called the analytical
level. The highest level of annotation called the tectogrammatical level, or the level of linguistic
meaning is based on FGD. The different layer differ in the type of nodes, labels and the structure
stipulated for the final analysis at that layer. (Hajičová, 2002)
2.2 Computational Paninian Grammar (CPG)
Computation Paninian Grammar (CPG) is a dependency grammatical model proposed by Bharati
et al., (1995a). CPG considers information as central to the study of language. When a writer (or a
speaker) uses language to convey some information to the reader (or the listener), he codes the
information in the language string. Similarly, when a reader (or a listener) receives a language
string, he extracts the information coded in it. CPG is primarily concerned with:
(a) how the information is coded and
(b) how it can be extracted.
Two levels of representation can be readily seen in language use: One, the actual language
string (or sentence), two, what the speaker has in his mind. The latter can also be called as the
meaning. Computation Paninian Grammar has two other important levels: karaka level and
vibhakti level
10
--- semantic level (what the speaker
| has in mind)
.
.
|
--- karaka level
|
|
--- vibhakti level
|
|
--- surface level (written sentence)
Figure 2.2. Levels of representation/analysis in the Computation Paninian Grammar
The surface level is the uttered or the written sentence. The vibhakti level is the level at which
there are local word groups together with case endings, preposition or postposition markers. The
vibhakti level abstracts away from many minor (including orthographic and idiosyncratic)
differences among languages. The topmost level relates to what the speaker has in his mind. This
may be considered to be the ultimate meaning level that the speaker wants to convey. Between
this level and vibhakti level is the karaka level. It includes karaka relations and a few additional
relations such as taadaarthya (or purpose). One can imagine several levels between the karaka
and the ultimate level (shown as a pair of dots between karaka and semantic level in Figure 2.2),
each containing more semantic information. Thus, karaka level is one in a series of levels, but one
which has relationship to semantics on the one hand and syntax on the other.
At the karaka level, we have karaka relations and verb-verb relations, etc. Karaka relations are
syntactico-semantic (or semantico-syntactic) relations between the verbs and other related
constituents (typically nouns) in a sentence. This is the level of semantics that is important
syntactically and is reflected in the surface form of the sentence(s).
CPG treats a sentence as a set of modifier-modified relations. A sentence is supposed to have a
primary modified which is generally the main verb of the sentence. The elements modifying the
verb participate in the action specified by the verb. The participant relations with the verb are
called karaka. The notion of karaka will incorporate the ‘local’ semantics (Rambow et al., 2003)
11
of the verb in a sentence, while also taking cue from the surface level morphosyntactic
information (Vaidya et al., 2009).
It is easy to see that this analysis is a dependency based analysis (Kiparsky and Staal, 1969;
Shastri, 1973), with verb as the root of the tree along with its argument structure as its children.
The labels on the edges between a child-parent pair show the relationship between them. Some of
the salient properties of CPG are shown in Table 2.1.
Properties
Representation Tree
Node Lexical/Phrasal
Dependency label Syntactico-semantic
Non-projective trees Allowed
Layers Surface: Morphological,
Vibhakti: Local Morphosyntax,
Karaka: Dependency
Table 2.1. Some salient properties of CPG
The following grammatical notions in CPG are used later to develop a generalized parsing
framework.
(1) aakaankshaa
The verb’s core requirements in order for it to be meaningful is the aakaankshaa of the verb. It
can be roughly translated as the ‘argument structure’ of the verb. For example, a verb like ‘put’
requires someone to put something somewhere, similarly, a verb like ‘eat’ requires someone to eat
something. As is expected, different verbs have different aakaankshaa.
(2) karaka
Karakas are the participants in the action specified by the verb. These relations as mentioned
earlier are syntactic-semantic in nature, in that they are syntactically grounded but also convey
some meaning. There are six basic karakas, namely;
k1: karta: the most independent participant in the action
12
k2: karma: the one most desired by the karta
k3: karana: instrument which is essential for the action to take place
k4: sampradaan: recipient of the action
k5: apaadaan: movement away from a source
k7: adhikarana: location of the action in time and space
In example 2.2, abhay is the k1 and kemeraa is the k2.
(2.2) abhay ne kemeraa rakhaa hai
‘Abhay’ ERG ‘camera’ ‘keep’ ‘is’
‘Abahy has kept the camera.’
In addition to the above relations many other have been proposed as part of the overall CPG
framework (Begum et al., 2008a; Bharati et al., 2009d). The most important of these relations
have been listed in APPENDIX I.
(3) Abhihita
The notion of abhihita signifies the karaka expressed by a verbal TAM (tense, aspect and
modality). This can be roughly translated as ‘agreement’. So for example, the main verb can point
to the karta karaka by agreeing with it. In example 2.3, the main verb rakhataa hai agrees with
abhay in gender, number and person.
(2.3) abhay t.v rakhataa hai
‘Abhay-M.Sg.3rd’ ‘T.V.’ ‘keeps-M.Sg.3rd’ ‘is-M.Sg.3rd’
‘Abahy keeps a T.V.’
(4) Vibhakti (Sup, Ting)
Vibhakti is an abstract concept used for signifying the case markings on the nouns and the tense,
aspect and modality of verbs. The former is called sup and the later is called ting. In example 2.2,
the ne postposition of abhay and the yaa_hai TAM of rakhaa hai are the sup and ting
respectively.
13
(5) Tiganta
Tiganta is the word in an utterance (sentence) that bears a ting. Tiganta conveys the notion of
finiteness of a verb. It is an important concept as the analysis of a sentence. The search of
different karakas starts with a finite verb and can be thought to be within its scope.
(6) Yogyataa
Yogyataa can be roughly translated as semantic selectional restriction of the verb. For instance, a
verb like ‘eat’ necessitates the presence an ‘animate being’ that has the capacity to eat something
that is ‘eatable’.
Many Indian languages shares a common set of properties (Emeneau, 1956; Krishnamurthi,
1986; Masica, 1993; Steever, 1998; Comrie, 1989). CPG has been used to analyze some of these
successfully (Begum et al., 2008a; Vempty et al., 2010; Husain, 2009; Husain et al., 2010). Some
such properties are:
a) Free word order
b) Rich morphology / Case marking
c) Participles
d) Relative-correlatives
e) Correlation between verbal TAM and subject/object case-marking.
2.4 (a-f) shows an example sentence for Hindi, where (2.4a) shows the words in the canonical
order, and the remaining examples show some of the word order variants of (2.4a).
(2.4) a. malaya ne sameera ko kitaaba dii
‘Malay’ ERG ‘Sameer’ DAT ‘book’ ‘gave’
‘Malay gave the book to Sameer’ (S-IO-DO-V)2
b. malaya ne kitaaba sameera ko dii (S-DO-IO-V)
c. sameera ko malaya ne kitaaba dii (IO-S-DO-V)
d. sameera ko kitaaba malaya ne dii (IO-DO-S-V)
2 S=Subject; IO=Indirect Object; DO=Direct Object; V=Verb; ERG=Ergative; DAT=Dative
14
e. kitaaba malaya ne sameera ko dii (DO-S-IO-V)
f. kitaaba sameera ko malaya ne dii (DO-IO-S-V)
Hindi also has a rich case marking system, although case marking is not obligatory. For
example, in (2.4), while the subject and indirect object are marked explicitly for the ergative3
(ERG) and dative (DAT) cases respectively, the direct object is unmarked for the accusative.
Figure 2.3 gives the CPG dependency analysis of example 2.4(a). In Figures 2.4 and 2.5
respectively CPG analysis for a Telugu example and Bangla example are shown.
Figure 2.3. CPG dependency analysis of sentence 2.4
(2.5) ramadu aapela winnalaikapoiyadu [Telugu]
‘Ram-masculine’ ‘Apple’ ‘eat-could-not’
‘I could not eat an apple’
Figure 2.4. CPG dependency analysis of sentence 2.5
(2.6) aami aapela khaai [Bangla]
‘I-1st_person’ ‘apple’ ‘eat-1
st_person-present’
‘I ate an apple’
3 Hindi is split-ergative. The ergative marker appears on the subject of a transitive verb with perfect morphology.
15
Figure 2.5. CPG dependency analysis of sentence 2.6
The parsers described in this work will for most part use CPG analysis (Begum et al., 2008a;
Bharati et al., 2009d). CPG is used because
a) It has been used to analyze various Indian languages such as Hindi, Telugu, Bangla,
Marathi, etc. A parser built using the notions in CPG will automatically benefit from its
grammatical devices in accounting the grammatical structures for these languages.
b) Dependency treebanks for various Indian languages such as Hindi, Urdu, Telugu and
Bangla are being built using CPG. Both constraint based parser and the data-driven parser
learn some parameters from a treebank.
16
Chapter 3
3. Dependency Parsing
Dependency parsing can be broadly divided into grammar-driven and data-driven parsing (Caroll,
2000). Most of the modern grammar-driven dependency parsers parse by eliminating the parses
which do not satisfy the given set of constraints. They view parsing as a constraint-satisfaction
problem. Data-driven parsers, on the other hand, use a corpus to induce a probabilistic model for
disambiguation (Nivre, 2005; and the references therein).
A dependency parsing model M comprises of a set of constraints Γ that define the space of
permissible dependency structures, a set of parameters λ and a parsing algorithm h.
M = (Γ, λ, h)
Γ maps an arbitrary sentence S and dependency type set R to a set of well-formed
dependency trees Gs. Additionally, it could encode more complex mechanism that can further
limit the space of the permissible structures:
Γ = (Σ, R, C)
where, Σ is the set of terminal symbols (here, words), R is the label set, and C is the set of
constraints. Such constraints restrict dependencies between words and possible head of the word
in well defined ways.
For data-driven parsing the learning phase tries to construct the parameter λ. The parameter is
generally learned using an annotated treebank that contains dependency trees. In grammar driven
parsing, λ is either null or uniform. They are not learnt automatically from a treebank. After
defining the parsing model, one needs a parsing algorithm to solve the parsing problem. That is,
given a set of constraints Γ, parameter λ and a new sentence S, how does the system find out the
most appropriate dependency tree G for that sentence
G = h (Γ, λ, S)
17
Based on the type of parsing strategy chosen, h will take on different form.
3.1 Constraint Based Parsing
The grammar-driven constraint based dependency parsing is based on the notion of eliminative
parsing, where sentences are analyzed by successively eliminating representations that violate
constraints until only valid representations remain. One of the first parsing systems based on this
idea is the CG framework (Karlsson, 1990; Karlsson et al., 1995), which uses underspecified
dependency structures represented as syntactic tags and disambiguated by a set of constraints
intended to exclude ill-formed analyses. In CDG (Maruyama, 1990), this idea is extended to
complete dependency structures by generalizing the notion of tag to pairs consisting of a syntactic
label and an identifier of the head node. This kind of modeling is important for many different
approaches to dependency parsing, since it provides a way to reduce the parsing problem to a
tagging or classification problem. This line has been explored by the extended CDG framework
of Harper and Helzerman (1995) and the FDG system (Tapanainen and Jarvinen, 1997; Jarvinen
and Tapanainen, 1998), where the latter is a development of CG that combines eliminative
parsing with a non-projective dependency grammar inspired by Tesniere (1959).
In the eliminative approach, parsing is viewed as a constraint satisfaction problem, where any
analysis satisfying all the constraints of the grammar is a valid analysis. For a fully defined
constraint satisfaction problem, we need to specify the variables, their domains and the set of
constraints that need to be satisfied:
(1) Set of variables: S = w0, w1, w2…wn represents the set of lexical item in a sentence
(2) The domain of variables wi is a set {wj | 1 ≤ j ≥ n and j ≠ i} (the possible heads of a word.)
(3) Set of constraints that define the permissible values for variables.
Constraint satisfaction in general is NP complete, which means that to ensure reasonable
efficiency in practice one has to use controlled heuristics. Early versions of this approach used
local consistency (Maruyama, 1990; Harper et al., 1995), which attain polynomial worst case
complexity by only considering local information in the application of constraints. In the more
recently developed XDG framework (Duchier, 1999, Debusmann et al., 2004), the problem is
negotiated using constraint programming to solve the satisfaction problem defined by the
grammar for a given input string. The XDG framework also introduces several levels of
18
representation, arguing that constraints can be simplified by isolating different aspects of the
grammar such as Immediate Dominance (ID) and Linear Precedence (LP) and have constraints
that relate different levels to each other (Duchier and Debusmann, 2001; Debusmann et al., 2004).
From the point of view of parsing unrestricted natural language text, parsing as constraint
satisfaction can be problematic in two ways. First, for a given input string, there may be no
analysis satisfying all constraints, which leads to a robustness problem. Secondly, there may be
more than one analysis, which leads to a problem of disambiguation. Menzel and Schroder (1998)
extends the CDG framework of Maruyama (1990) with graded, or weighted, constraints, by
assigning a weight w (0.0 ≤ w ≤ 1.0) to each constraint indicating how serious the violation of
this constraint is (where 0.0 is the most serious). In this extended framework, later developed into
WCDG (Schroder, 2002), the best analysis for a given input string is the analysis that minimizes
the total weight of violated constraints. The more recent versions of this approach use a
transformation-based approach, which successively tries to improve the analysis by transforming
one solution into another guided by the observed constraint violations in the current solution.
3.2 Data-Driven Parsing
A data-driven approach to parsing primarily makes use of machine learning from an annotated
data in order to parse new sentences. More precisely such methods are called supervised data-
driven methods. There are two main problem, (a) learning problem, which is the task of learning
a parsing model from a representative sample of structure of sentences (training data), and (b) the
parsing problem (or inference/decoding problem), which is the task of applying the learned model
to the analysis of a new sentence. Consequently, data-driven methods differ in the type of parsing
model, the algorithm used to lead the model from data and the algorithm used to parse a new
sentence. The two major classes of approaches fall either into transition based or graph based
data-driven methods. Transition-based methods start by defining a transition system, for mapping
a sentence to its dependency graph. The learning problem is to induce a model for predicting the
next state transition, given transition history, and the parsing problem is to construct the optimal
transition sequence for the input sentence, given the induced model. Graph-based methods instead
define a space of candidate dependency graphs for a sentence. The learning problem is to induce
a model for assigning scores to the candidate dependency graphs for a sentence, given the
induced model (Kubler et al., 2009).
19
Dependency data-driven parsing was first established by Eisner (1996) using graph-based
methods, the transition-based approach was first explored by Kudo and Matsumoto (2002) and
Yamada and Matsumoto (2003). The two parsing methods that we use in this work are MaltParser
(Nivre et al., 2007a), a transition based system and MSTParse (McDonald et al., 2005b), a graph
based system.
3.3 Constraint Based Parsing for CPG
Constraint based parsing using integer programming has been successfully tried for Indian
languages (Bharati et al., 1993, 1995a, 1995b and 2002). Under this scheme the parser exploits
the syntactic cues present in a sentence and forms constraint graphs based on the generalizations
present. It then translates the constraint graph into an integer programming problem. Bipartite
graph matching is then used for finding the solution. The solutions to the problem provide all
possible parses for the sentence.
As part of this framework, a mapping is specified between karaka relations and postpositions.
In CPG this mapping is given by a grammatical structure called the basic karaka frame4. It
specifies whether a karaka is mandatory or optional and what vibhakti (postposition/suffix) it
would take. Basic karaka frame given in Table 3.1 correctly derives sentence (3.1).
(3.1) baccaa haatha se kelaa khaataa hei
‘child’ ‘hand’ INST ‘banana’ ‘eats’ ‘is’
‘The child eats the banana with his hand.’
karaka vibhakti Presence
karta (k1) 0 mandatory
karma (k2) ko or 0 mandatory
karana (k3) se or dvaara optional
Table 3.1. Basic karaka frame for khaa ‘eat’
This mapping between karakas and vibhakti depends on the verb and its tense, aspect, and
modality (TAM) label. The mapping is represented by two structures: basic karaka frame and
4 The original formulation of CPG uses ‘karaka frame’, the text that will follow will use the term ‘demand frame’
instead.
20
karaka frame transformations. The basic karaka frame for a verb or a class of verbs gives the
mapping for the TAM label called basic. It specifies the vibhakti permitted for the applicable
karaka relations for a verb when the verb has the basic TAM label. Table 3.1 gives the frame for
the verb khaa eat’ when it takes the default TAM label taa hei (which corresponds to present
indefinite). For other TAM labels there are karaka frame transformation rules. Thus, for a given
verb with some TAM label, appropriate karaka frame can be obtained using its basic karaka
frame and the transformation rule depending on its TAM label (Bharati et al. 1995a). For
example, if the verb takes a yaa TAM the transformation rule associated with this TAM will
modify the vibhakti of the karta to ne. With this rule we can account for sentence like (3.2);
(3.2) bacce ne haatha se kelaa khaayaa
‘child’ ERG ‘hand’ INST ‘banana’ ‘ate-PERF’
‘The child ate the banana with his hand.’
A demand group is an element which makes demands, for example verbs make demands for
their karakas through demand frames. These demands are satisfied by source groups. A source
group becomes a potential candidate for a verb only after it satisfies the vibhakti specification as
mentioned in the verb’s karaka frame. This can be shown in the form of a constraint graph. Nodes
of the graph are the word groups and there is an arc labeled by an appropriate karaka relation
from a verb group to a selected source group. In figure 3.1, all such source groups are nouns.
Figure 3.1. Constraint Graph for sentence 3.1.
The constraint graph for sentence (3.1) is shown in Figure 3.1. Note that each arc in a CG not
only has a dependency label associated with it but also has the necessity information of the
relation ([m]andatory or [o]ptional). A parse is a sub-graph of the constraint graph thus formed,
containing all the nodes of the constraint graph and satisfying some conditions. A constraint
graph is converted into an integer programming problem by introducing a variable x for an arc
from node i to j labeled by karaka k in the constraint graph such that for every arc there is a
21
variable. The variables take their values as 0 or 1. Figure 3.2 shows the solution sub-graphs for
sentence (3.1).
(a) Solution 1 sub-graph (b) Solution 2 sub-graph
Figure 3.2. Solution parses for sentence 3.1
22
Chapter 4
4. A Two Stage Generalized Hybrid Constraint Based Parser
(GH-CBP)
As mentioned in chapter 1, we incorporate grammatical notions from CPG to build a generalized
parsing framework. In this chapter we will use following notions from CPG and incorporate them
in a constraint based parsing system to build generalized hybrid constraint based parser (GH-
CBP).
(1) Aakaankshaa ‘requirement of a head’
(2) Karaka ‘participants in an action’
(3) Abhihita ‘marked by the verb’
(4) Vibhakti ‘nominal and verbal inflections’
(5) Tiganta ‘finiteness’
These notions have been explained earlier in chapter 2. Later in this chapter, we will also
incorporate insights from graph-based dependency parsing to rank the parses obtained from GH-
CBP.
4.1 Parsing in layers
Relations that exists between pairs of nodes of a dependency tree can signify various functions. A
verb-noun relation, for example, is different from a relation between noun and its postposition.
Many times one class of relations can be mutually exclusive of the other class. In this section we
will introduce two linguistic domains using which we will classify dependency labels into
different types. The first domain is the notion of chunk which will help us distinguish between
local and non-local dependency relations. On similar lines, the domain of a clause will help us
distinguish between intra-clausal and inter-clausal relations. The distinction of different types of
dependency relations will allow us to cater to each type individually during the parsing process.
23
This modularity seems to be similar in spirit to the work such as (Harris, 1962) and (Joshi and
Hopely, 1999).
4.1.1 Chunk as Minimal Parsing Unit
Chunk in languages such as Hindi, Bangla, etc. captures two things
a) Local Word Group (LWG
b) Local dependencies (for example, relations between an adjective and a noun)
The notion of vibhakti in these languages can be captured using the concept of LWG. In such
languages the case markers on nouns and the tense, aspect and modality (TAM) marker on the
verb are lexicalized and appear as separate words. These case and TAM markers play an
important role in identifying various dependency relations. By making them part of the verb or
the noun chunk we can easily localize such morphosyntactic information.
Such elements inside a chunk that are important for dependency parsing can be made
available through chunk head’s feature structure. In Figure 4.1(c) we see that the case and TAM
marker (ne and yaa_thaa respectively) have been percolated at the chunk level. Head percolation
makes available all the relevant information needed to parse the sentence available at the chunk
head’s feature structure. Morphological features (including agreement features) can be similarly
made available.
(4.1) raama ne khaanaa khaayaa thaa
‘Ram’ ERG ‘food’ ‘eat’ ‘was’
‘Ram ate the food.’
24
Figure 4.1. (a) Example 4.1 with POS tags, (b) With Chunk boundaries,
(c) With chunk head and vibhakti features
Many other local modifications such as adjectives modifying a noun have no effect on the global
dependency structure. Therefore, these elements along with function elements can be made part
of a chunk. In general, all the nominal inflections, nominal modifications (adjective modifying a
noun, etc.) are treated as part of a noun chunk, similarly, verbal inflections, auxiliaries are treated
as part of the verb chunk (Bharati et al., 2006).
In GH-CBP, for languages such as Hindi, Bangla, etc. chunks are treated as minimal parsing
unit and relations in a dependency tree represent relations between chunk heads. The relations
inside the chunks are made explicit in a post-processing step. For agglutinative languages such as
Telugu, chunk does not capture the notion of vibhakti (as the information is available via
suffixes), rather it is used simple to ignore local dependencies. Figure 4.2 shows the dependency
tree for example sentence (4.1). Notice the absence of ne and thaa as they become a part of their
respective chunks. A node in such a tree correspond to a chunk head.
Figure 4.2. Chunk heads as dependency tree nodes.
4.1.2 Clause as Minimal Parsing Unit
Tiganta in CPG conveys the notion of finiteness of a verb. It is an important concept as the
analysis of a sentence and the search of different karakas starts with a finite verb. We posit the
25
notion of a clause to demarcate the scope of a finite verb. This is captured in GH-CBP by treating
a clause as a minimal parsing unit. Similar to what we saw in the previous section, once a
minimal parsing unit has been identified, we can again divide the dependency relations into two
classes but this time on the basis of clause.
4.1.2.1 Two Stages
Treating a clause as a minimal parsing unit in GH-CBP leads to a two-stage analysis of an input
sentence. In the 1st stage only intra-clausal dependency relations are extracted. The 2
nd stage then
tries to handle more complex inter-clausal relations such as those involved in constructions of
coordination and subordination between clauses. To illustrate this let us consider example (4.2).
(4.2) mai ghar gayaa kyomki mai bimaar thaa
’I’ ’home’ ’went’ ’because’ ’I’ ’sick’ ‘was’
‘I went home because I was sick’
Figure 4.3a shows the 1st stage analysis of sentence 4.2. Both the matrix clause mai ghar gayaa
and the subordinate clause are shown parsed with their respective heads attached to _ROOT_.
The subordinating conjunction kyomki is also seen attached to the _ROOT_ and not to the matrix
clause. The dependency tree thus obtained in the 1st stage is partial. In the 2
nd stage the
relationship between the two clauses are identified. The 2nd
stage parse for (4.2) is shown in
Figure 4.3b. Under no condition the 2nd
stage modifies the parse sub-trees obtained from the 1st
stage. The 2nd
stage only tries to establish relations between the clauses thereby giving the
complete dependency analysis.
26
Figure 4.3. (a): 1st stage output, (b): 2
nd stage final parse for example 4.1
As another instance of two stage parsing take example (4.3), a relative-correlative
construction.
(4.3) [ jo ladakaa vahaan baithaa hai ] vaha meraa bhaaii hai
‘which’ ‘boy’ ’there’ ’sitting’ ’is’ ’that’ ’my’ ’brother’ ‘is’
‘The boy who is sitting there is my brother’
(a): Output of Stage 1 (b): Output of Stage 2
Figure 4.4. Parse outputs for sentence 4.3
27
Trees corresponding to the output of 1st stage and 2
nd stage are shown in Figure 4.4. During the 1
st
stage the parser tries to find all the dependency relations of both the finite clauses vaha meraa
bhaai hai and jo ladakaa vahaan baithaa hai. Having done that, in the 2nd
stage it tries to attach
the referent in the main clause to which the co-referent jo in the relative clause corefers. The root
of the relative clause gets attached to this element with an ‘nmod__relc’ (relative clause relation).
Two stage analysis of sentences holds true for languages other than Hindi. Figure 4.5 shows
the two stage analysis for sentence 4.4, a Telugu complement clause construction.
(4.4) wulasi golilu mAnesiMxi ani ramA ceVppiMxi [Telugu]
‘Tulasi’ ‘tablets’ ‘stopped using’ ‘that’ ‘Rama’ ‘told’
‘Rama told that Tulasi stopped using tablets.’
(a) (b)
Figure 4.5. Stage 1 and Stage 2 outputs for sentence 4.3
Coordination of clauses also provides a situation where two stage parsing is useful. Figure 4.6
shows some prototypical inter-clausal structures that are parsed in the 2nd
stage.
28
Figure 4.6. Some inter-clausal structures. T1: Coordination structure, T2: Subordination structure,
T3: Relative clause structure. (CCP is a conjunct chunk, VG is a finite verb chunk, NP is a noun
chunk. ’ccof’ is the conjunct relation, ‘nmod__relc’ is relative clause relation)
Table 4.1 below shows the division of different relations in the two stages.
Stage Relations
Stage I
(Intra-clausal)
i. Argument structure of the finite verb
ii. Argument structure of the non-finite verb
iii. Adjuncts of finite and the non-finite verb
iv. Noun modifications
v. Adjectival modifications
vi. Non-clausal coordination
Stage II
(Inter-clausal)
i. Clausal coordination
ii. Clausal subordination
iii. Clausal complement
iv. Relative-Correlative construction
Table 4.1. Division of relations in the two stages
29
For all the experiments described in this work, the following definition of clause is used:
‘A clause is a group of words containing a single finite verb and its dependents’.
We note here that these dependents cannot themselves be finite verbs. Also, subordinating
conjunctions and finite verb coordinating conjunctions are not treated as part of a clause. And
therefore, a sentence such as ‘John said that He will come late’ has 3 units; (1) John said, (2)
that, and (3) He will come late. Similarly, ‘John ate his food and he went shopping’ has 3 units;
(1) John ate his food, (2) and, and (3) he went shopping.
Let us now define the input to the 2nd stage more precisely. Let T be the complete tree that
should be output by the 2nd
stage parser and let G be the subgraph of T that is input to the 2nd
stage. Then G should satisfy the following constraint: if the arc x → y is in G, then, for every z
such that y → z is in T, y → z is also in G. In other words, if an arc is included in the 1st stage
partial parse, the complete subtree under the dependent must also be included. This constraint
holds true for all the 2nd
stage construction with the exception of relative clauses.
4.1.2.1.1 Status of ‘_ROOT_’
An artificial dummy node named _ROOT_ becomes the head of dependency trees at both the
stages. This is done so that all the categories including verbs are handled in the same way and that
the eliminative constraints (explained shortly) act consistently across all the categories. The only
exception now is _ROOT_ for which we have the constraint that at the end of 2nd
stage it should
have only one outgoing arc (and of course no incoming arc). By introducing _ROOT_ we are able
to attach all unprocessed node to it. _ROOT_ ensures that the output we get after each stage is a
tree.
4.1.2.1.2 Partial Parse
The _ROOT_ node takes all the unattached nodes as its children. If due to some reason the parser
is unable to analyze a clause, the parse for other clauses are produced and those sub-trees are
shown attached to the _ROOT_. Figure 4.7 shows the 1st stage parse for sentence 4.5 where the
first clause raama ghara gayaa is correctly analyzed, but other clause usane so gaya, being
ungrammatical, was not parsed successfully, and attaches directly to the _ROOT_. Since this is a
1st stage parse, aura is also seen attached to the _ROOT_.
30
(4.5) raama ghara gayaa aura usane so gayaa
‘Ram’ ‘home’ ‘went’ ‘and’ ‘he-ERG’ ‘sleep’ ‘went’
‘Ram went home and he slept’
Figure 4.7. Partial parse for sentence 4.5.
4.1.3 A Layered Architecture
It is clear that by treating chunk and clause as minimal parsing units, GH-CBP divides the task of
dependency parsing in layers, wherein specific tasks are broken down into smaller linguistically
motivated sub-tasks.
i. The first subtask is Part of speech tagging and chunking along with morphological
analysis which is treated as a pre-processing step before the task of dependency parsing.
ii. Parse a POS tagged and chunked input in two stages. The parser first tries to extract intra-
clausal dependency relations and builds clausal sub-trees. In the 2nd
stage the clausal sub-
trees are connected to form the complete dependency tree.
iii. Finally, intra-chunk dependencies are identified as a post-processing step to (i) and (ii).
Once this is done, the chunks can be removed and one gets a complete dependency tree.
What is implied here is that the decisions taken at (ii), i.e. establishing the relations
between chunk heads, are more or less independent of the dependencies between words
inside a chunk. As discussed earlier, there are different kinds of dependencies between
elements inside a chunk. For example, a noun-adjective relation is different from a noun-
31
postposition/suffix relation. While establishing relations between chunk heads, the
properties of the head in the form of suffix/postposition/auxiliaries are available and other
chunk elements can be safely neglected in step (ii).
Figure 4.8. (a) POS tagged and chunked sentence,
(b) partial parse tree after stage 1 of GH-CBP (c) parse tree after stage 2 of GH-CBP,
(d) complete parse after intra-chunk dependencies identification
Figure 4.8(a-d) shows the output of each of the previously discussed layers for example (4.6). In
the dependency trees (b) and (c), each node is a chunk head. After removing the chunks in (d)
each node is a lexical item of the sentence.
(4.6) mohana ne tebala para apani kitaaba rakhii Ora vaha so gayaa
’Mohan’ ‘ERG’ ‘table’ ‘on’ ‘his’ ‘book’ ‘kept’ ‘and’ ‘he’ ‘sleep’ ‘PRFT’
‘Mohan placed his book on the table and slept’
32
4.2 Constraints
Constraints in GH-CBP framework perform distinct functions. These constraints are:
a) Licensing constraints (CL)
b) Eliminative constraints (CE)
c) Preferential constraints (CP)
4.2.1 Licensing Constraints (CL)
Licensing constraints (CL) help in forming a constraint graph (CG) for a sentence. A CG
constrains the space of permissible dependency structures that the parser will eventually explore
in order to get a solution. An arc between two words in a CG is added if it is licensed by CL . An
arc in a CG, in addition to providing the relation, also gives the necessity information
([m]andatory or [o]ptional) of the relation. Contrast this with constraint systems (Maruyama,
1990; Harper et al., 1995; Schroder, 2002) that employ constraint propagation and therefore start
with a complete graph, a constraint network, in which the nodes represent the variables (i.e.
words) and arcs the constraints. Figures 4.9 (a),(b) respectively shows the CG and constraint
network for an earlier example (3.1). While Figure 4.9 (c) shows the 2nd
stage CG for the example
sentence 4.7. Note that the 2nd
stage CG will have far less nodes than a 1st stage CG.
(a) Constraint Graph (b) Constraint Network
Figure 4.9
(4.7) bacce ne kelaa khaayaa aura so gayaa
’child’ ERG ‘banana’ ‘eat’ ‘and’ ‘sleep’ ‘went’
‘The child eat the banana and slept’
33
Figure 4.9 (c) 2nd
stage CG for sentence 4.7
Licensing constraints discussed in the later parts of this section encode the grammatical notions of
CPG.
4.2.1.1 H-Constraints
Hard constraints (H-constraints) encode the notion of aakaankshaa in the form of a demand
frame. They also encode the TAM-postposition mapping in a structure called transformation
frame. H-constraints reflect the lexical aspect of the grammar that if violated the sentence
becomes ungrammatical.
Table 4.2 shows a demand frame for a Hindi verb de ‘give’. We say the verb de is a demand
group that makes some demands using its demand frame. A demand frame shows the constraints
associated with the arguments of the demand group. It shows the vibhkati (post-position/suffix),
the category of the argument. It also shows if the argument is ‘mandatory’ or ‘optional’. The
relation field shows that the relevant candidate is a child; alternatively, it could be a parent.
karaka vibhakti Category Presence Relation
karta (k1) 0 noun mandatory child
karma (k2) ko or 0 noun mandatory child
sampradaan (k4) ko noun mandatory child
Table 4.2. Basic demand frame for Hindi verb de ‘give’
karaka vibhakti Category Presence Relation
karta (k1) 0 noun mandatory child
karma (k2) 0 or ni noun mandatory child
Table 4.3. Basic demand frame for Telugu verb tin ‘eat’
A demand frame for a verb reflects its argument structure when it occurs with a specific TAM
(tense, aspect and modality) marker. All the demand frames for Hindi verbs have been formed
34
with the taa_hei (present indefinite) TAM. So, the frame shown in Table 4.2 shows the argument
structure of de when it is used as ‘detaa hei’.
In Hindi (as well as other Indian languages such as Bangla and Telugu) change in TAM
marker can affect the vibhakti (nominal sup) of an argument. Such alternations, are encoded in
transformation frames. Table 4.4 shows the transformation frame for passive alternation in Hindi.
The frame shows the revised vibhakti that the arguments should take, it also shows that the
‘presence’ status of the arguments have reversed. Transformation frame have an ‘operation’ field,
in Table 4.4 it is ‘update’. For certain other TAMs such as kara (shown in table 4.5), it can also
be ‘delete’ or ‘insert’. These operations are used by the ‘Demand status transformation meta-
constraint (described in section 4.2.1.2) to handle various verbal alternations.
karaka vibhakti Category Presence Relation Operation
karta (k1) se or
ke_dvaaraa
. optional . update
karma (k2) 0 . mandatory . update
Table 4.4. Hindi passive TAM transformation frame.
karaka vibhakti Category Presence Relation Operation
vmod FINITE v_fin mandatory parent insert
k1 - - - - delete
k2 . . optional child update
Table 4.5. Transformation frame for kara
4.2.1.2 Meta-Constraints
Meta-constraints allow principled handling of various syntactic constructions while forming a
CG. They are overarching language independent constraints that encode certain linguistic
generalizations. While forming the CG, meta-constraints also use the H-constraints described in
the previous section. Grammatical notions like agreement, control, passives, gapping, verbal
alternations, etc. are accounted via these constraints. There are 4 meta-constraints, they have been
illustrated using Hindi, Telugu and Bangla. While the first two constraints are declarative, the last
two are more procedural in nature.
35
a) Feature unification,
b) Demand status tranformation,
c) Revision,
d) Look-ahead.
4.2.1.2.1 Feature Unification
The feature unification is a generic constraint wherein an arc between two words is added in a
CG if their attribute-value unify. Feature unification can be instantiated in various forms; the
agreement feature unification is one such instantiation. This encodes the notion of abihita from
CPG. The actual manifestation of agreement of a finite verb will vary from one language to
another with its karakas. The agreement varies depending on the TAM of the verb as shown in the
Table 4.6 to 4.9
Language Agreement features
Hindi Nominative k1
{gender, number, person}
Nominative k2
(if non-nominative k1)
{gender, number, person}
Telugu Nominative k1
{gender, number, person}
Bangla Nominative k1
{person}
Table 4.6. Agreement pattern: Simple declarative TAM
Language Agreement features
Hindi Nominative k2
{gender, number, person}
Telugu Nominative k1
{gender, number, person}
Bangla Default
{3rd
person}
Table 4.7. Agreement pattern: Inabilitative TAM
36
Language Agreement features
Hindi Nominative k2
{gender, number, person}
Telugu Nominative k2
{gender, number, person}
Bangla Default
{3rd
person}
Table 4.8. Agreement pattern: Obligational TAM
Language Agreement features
Hindi Nominative k2
{gender, number, person}
Telugu Nominative k1
{gender, number, person}
Bangla Nominative k1
{person}
Table 4.9. Agreement pattern: Perfective TAM
Tables 4.6-4.9 show the agreement patterns in the context of simple declarative, inabilitative,
obligational and perfective TAM in Telugu, Bangla and Hindi. Figure 4.10 shows the use of
agreement constraint for (4.8), a Bangla sentence with simple declarative verb. Similarly Figure
4.11 shows the use of agreement constraint for (4.9), a Telugu sentence with a verb in inablitative
TAM.
(4.8) aami aapela khaai [Bangla]
‘I-1st_person’ ‘apple’ ‘eat-1
st_person_present’
‘I eat an apple.’
karaka vibhakti Category Presence Relation
karta (k1) 0 noun mandatory child
karma (k2) 0 or ke noun mandatory child
Table 4.10. Basic demand frame for Bangla verb khaa ‘eat’
37
Figure 4.10. CG for example 4.8
(a) CG without agreement constraint, (b) CG with the constraint.
(4.9) ramadu aapela winnalaikapoiyadu [Telugu]
‘Ram-masculine’ ‘Apple’ ‘eat-could-not’
‘Ram could not eat an apple’
karaka vibhakti Category Presence Relation
karta (k1) 0 noun mandatory child
karma (k2) 0 or ni noun mandatory child
Table 4.11. Final frame for Telugu verb tin ‘eat’ with inablitative TAM.
(a) CG without agreement constraint, (b) CG with the constraint.
Figure 4.11. CG for example 4.9
We see in Figures 4.10 and 4.11 that application of agreement constraint can deem some of the
arcs in a CG illegal thereby removing such arcs. However, use of agreement feature unification as
a CL might not always be possible. Two cases when this happens are
38
a) When certain agreement pattern in a language is not precise enough to isolate a relation; for
example, in Bangla when k1 is in 3rd
person, or in Telugu when k1 is feminine, or in Hindi
when the TAM is simple declarative.
b) Due to robustness issue. For example, if the performance of the morphological analyzer for a
language is not very high or the text is known to have grammatical errors.
In such cases, agreement feature unification can be treated as Cp (Preferential Constraints;
explained in Section 4.2.3).
4.2.1.2.2 Demand Status Transformation
Demand status transformation accounts for various changes that are required in the demand
frames in order to handle phenomenons like verbal alternations, control, gapping, passives,
relative clause, etc. In many IL, these phenomenons have lexical manifestations (in the form of
TAM, etc.). In such cases the basic frame is transformed using this constraint before the frame is
used to construct the CG. It uses a transformation frame and transforms the status of a relation
stipulated in the basic demand frame based on the operation in the transformation frame.
Consider the basic frame of Hindi verb de ‘give’, the passive transformation frame, and kara
participle transformation frame mentioned in section 4.2.1.1. The basic de frame cannot account
for the case marking on nouns in examples 4.10 and 4.11. This is because these sentence use de
with a TAM that is different from the present indefinite TAM used to build the basic frame.
Recall that the basic demand frame is made using the present indefinite TAM.
(4.10) khilaunaa bacce ke dvaaraa abhay ko diyaa gayaa
’toy’ ’child’ GEN ‘by’ ‘Abhay‘ DAT ‘given’ ‘was’
‘The toy was given to Abhay by the child.’
(4.11) bacca abhay ko khilaunaa dekara so gayaa
’toy’ ’Abhay’ DAT ‘toy’ ‘having given’ ‘sleep’ ‘was’
‘Having given the toy to Abhay the child slept.’
For these sentences demand status transformation transforms the basic frame using the
appropriate transformations to get the adequate frames that are then used to build the correct CG.
39
Tables 4.12 and 4.13 show the de frames after the application of the demand status transformation
constraint.
karaka vibhakti Category Presence Relation
karta (k1) se or
ke_dvaaraa
noun optional child
karma (k2) 0 noun mandatory child
sampradaan (k4) ko noun mandatory child
Table 4.12. Final demand frame for de ‘give’ after passive transformation.
The lightly shaded cells show the modifications.
karaka vibhakti Category Presence Relation
karma (k2) ko or 0 noun optional child
sampradaan (k4) ko noun mandatory child
vmod FINITE v_fin mandatory parent
Table 4.13. Final demand frame for de ‘give’ after kara transformation.
The lightly shaded cells show the modifications. The original k1 demand was deleted.
Use of TAM can similarly account for perfective and obligational constructions. Other TAMs
signifying control such as nA, ta_huA, etc. are handled similar to the kara transformation. Table
4.10 earlier showed the basic frame for Bangla verb khaa. In Table 4.14 we show the
transformation frame for obligational TAM te_holo in Bangla. The final frame that accounts for
sentence 4.12 is shown in table 4.15.
(4.12) aamaake aapela khete holo [Bangla]
‘I-Acc’ ‘Apple’ ‘eat’ ‘had to’
karaka vibhakti Category Presence Relation Operation
karta (k1) ke . . . update
Table 4.14. Transformation frame for te_holo
40
karaka vibhakti Category Presence Relation
karta (k1) ke noun mandatory child
karma (k2) 0 or ke noun mandatory child
Table 4.15. Final demand frame for Bangla verb khaa ‘eat’ after te_holo transformation
The lightly shaded cells show the modifications.
Similarly, Table 4.17 shows the final frame for the Telugu verb ‘eat’ needed to account for its
participle usage tintu in sentence 4.13. The basic frame for this verb was earlier shown in Table
4.11. The constraint was applied using the participle transformation frame shown in Table 4.16.
(4.13) nenu tintu intiki vellanu [Telugu]
‘I’ ‘while eating’ ‘home-ACC’ ‘went’
karaka vibhakti Category Presence Relation Operation
vmod FINITE v_fin mandatory parent insert
k1 - - - - delete
k2 . . optional child update
Table 4.16. Transformation frame for Telugu TAM tu
karaka vibhakti Category Presence Relation
vmod FINITE v_fin mandatory parent
karma (k2) 0 or ni noun optional child
Table 4.17. Final demand frame for Telugu verb tinta ‘eat’ after tu transformation
The lightly shaded cells show the modifications. The original k1 demand was deleted.
Relative clause (in languages such as Hindi, Bangla, etc.) and gapping, on the other hand, are
not analyzed via TAMs. In such cases, the constraint is triggered not because of the TAM but
because of some lexical item. For instance, in the case of gapping shown in example (4.14) the
constraint relaxes the requirements of the second verb so ‘sleep’ and foresees a potential gapping
due to the presence of the lexical item signifying coordinating conjunction aura ‘and’.
(4.14) bacce ne kelaa khaayaa aura so gayaa
’child’ ERG ‘ banana’ ‘eat’ ‘and’ ‘sleep’ ‘went’
‘The child ate the banana and slept’
41
4.2.1.2.3 Revision
In the previous sections we saw how H-constraints along with demand status transformation
helped in forming CGs for different kinds of constructions. So for example, the frames shown in
Tables 4.12 and 4.13 could account for passives and participles respectively. Notice that in all the
H-constraints we saw earlier, the demands are constrained on vibhakti and category. Having
noted this, let us now try to analyze sentence 4.15.
(4.15) raama aura sitaa ne khaanaa khaayaa
’Ram’ ’and’ ’Sita’ ERG ’food’ ‘ate’
‘Ram and Sita ate food.’
The final analysis of example 4.15 is shown in Figure 4.12. We see that the the coordinating
conjunct aura is attached to the main verb with relation ‘k1’. It is clear that we cannot get the
correct CG that can lead us to the parse shown in Figure 4.12 using the frame for khaayaa simply
because the frame does not demand for a ‘k1’ conjunct. Instead, it demands for a ‘k1’ noun with
ne postposition (sitaa satisfies this constraint in this sentence). This CG is shown in Figure 4.13.
Figure 4.12. Dependency tree for example 4.15
Figure 4.13. CG for example 4.15
42
A coordinating conjunction can potentially take any lexical category as child and bear its
properties. This means that all the heads such as verbs, nouns, adjectives and even conjunctions
can in turn take a coordinating conjunct as their child. Figure 4.14 shows this clearly. The
revision meta-constraint handles all the constructions where a coordinating conjunct is a potential
child of a head in a CG.
Figure 4.14. Variable property of coordinating conjunction.
To accomplish this, the following general principle is followed:
For any node becoming a potential child of a coordinating conjunct, its existing parents can
also become the parent of the conjunct. For example, in Figure 4.15, after node 2 is identified as a
potential child of a coordinating conjunct (node 3) its existing parent 0 also becomes the potential
parent of 3 (shown as dashed arc).
(a) CG before revision (b) CG after revision
Figure 4.15. Revision of CG. Node 3 is a coordinating conjunct.
The above principle is used with slight variations with different heads. In the case of a verbal
head, the label of incoming arc into the conjunct is governed by the vibhakti of the right child of
the conjunct. This can be seen in Figure 4.16.
43
(a) CG before revision (b) CG after revision
Figure 4.16. Revision of CG with labels. The head of right child (node 2) is a verb (node 0).
Figure 4.17, shows the CG for example (4.15). For ease of exposition we do not show the
necessity information on the arcs. Notice that the incoming arc into aura is labeled k1 due to the
presence of ne postposition in its right child sitaa. Notice also that the left child of the conjunct
was not used to decide the incoming arc label.
(a) CG before revision (b) CG after revision
Figure 4.17. Revision of CG for example 4.15.
Example 4.16 shows a sentence where a subordination conjunction takes a coordinating conjunct
as its child. For subordinating conjunctions the original formulation of the revision meta-
constraint is used.
(4.16) raama ghar gayaa kyomki use bhook lagi thi aura vaha bimaar
‘Ram’ ‘home’ ‘went’ ‘because’ ‘he’ ‘hunger’ ‘had’ PAST ‘and’ ‘he’ ‘sick’
bhi tha
‘also’ ‘was’
‘Ram went home because he was hungry and was sick as well.’
44
The subordinating conjunction kyomki ‘because’ in the 2nd
stage CG should take a coordinating
conjunct aura ‘and’ as its child. During the 2nd stage CG formation, to begin with, ‘because’ will
only take a single finite clause as its child (cf. Table 4.18), but the revision constraint will ensure
that the CG contains an incoming arc from ‘because’ into ‘and’. Figure 4.18(a) shows the CG
when the revision principle is not applied, while 4.18(b) shows one after it is applied. For
simplicity, the figure only shows the finite verb children and the conjuncts.
karaka vibhakti Category Presence Relation
ccof FINITE v_fin mandatory child
rh FINITE v_fin mandatory parent
Table 4.18. Demand frame for subordinating conjuncts.
Figure 4.18. 2nd
stage CG for example 4.16
(a) Revision not applied, (b) Revision applied
In Figure 4.18(a), ‘and’ takes V2 and V3 as its potential children, since ‘because’ also takes
V2 as its child, the revision constraint ensures that ‘and’ also becomes a potential child of
‘because’ (Figure 4.18(b)).
4.2.1.2.4 Look-Ahead
Whenever the search for a potential child/parent while forming a CG is greedy there is a chance
of missing out a valid candidate. In a greedy search the first potential candidate that satisfies
some H-constraint is chosen. Greedy search is the default strategy for most demands during 2nd
stage. It is also used in case of non-verbal demand groups (e.g. nominal and adjectival
modifications) in the 1st stage. This strategy although effective and very efficient does not work in
two cases:
45
(a) Head of a paired connectives occurring as a potential child
(b) Nominal with genitive marker occurring as a potential parent/child
Example 4.17 shows a Hindi construction with a paired connective agara-to. A greedy search
for the subordinating conjunction kyomki using the demand frame shown in Table 4.18 will never
allow kyomki taking to as a potential child. In such a case, look-ahead is required to construct the
correct CG. This is shown in Figure 4.19 using a dashed arc. The head of the paired connective
(here to) triggers this constraint.
(4.17) kyomki agara tuma aaoge to mai bhi aaungaa
‘because’ ‘if’ ‘you’ ‘come’ ‘then’ ‘I’ ‘also’ ‘will come’
‘Because if you come, I’ll also come.’
Figure 4.19. Look-ahead constraint applied to sentence 4.17
A similar example for Bangla is given as 4.18
(4.18) kaarona jodi tuni asho taahole aamio aashbo. [Bangla]
‘Because’ ‘if’ ‘you’ ‘come’ ‘then’ ‘I’ ‘will come’
The look-ahead constraint is also needed in cases when nominals with genitive case marker
become potential parent/child. The search of potential candidates for all non-verbal demands in
the 1st stage is also greedy. And therefore, in example 4.19 only billi becomes a potential parent
for the adjectival participle bhaagte hue. Without the look-ahead, getting the correct relation
between bhaagte hue and bacce is not possible. The constraint leads the search till the potential
final head of billi. This is shown in Figure 4.20 (a), (b).
(4.19) abhay ne bhaagte hue billi ke bacce ko pakada liya
‘Abhay’ ERG ‘running’ ‘cat’ GEN ‘child’ ACC catch PAST
‘Abhay caught the running kitten’
46
Figure 4.20. Look-ahead constraint applied to sentence 4.19
(a) CG without look-ahead constraint, (b) CG after look-ahead constraint
Meta-constraints Grammatical notion
Feature unification Agreement
Demand status
transformation
Control, passives, relative clause, verbal alternations,
gapping
Revision Coordinating conjuncts as children
Look-ahead Paired connectives, Genitive chain
Table 4.19. Grammatical notions handled via Meta-constraints.
Table 4.19 summarizes various grammatical generalizations (in Hindi, Urdu, Telugu and
Bangla) accounted by the four meta-constraints. Some constructions such as missing copula in
Bangla and Telugu, and missing genitive case markers in Telugu are still not handled by the
parser. For Telugu these issues have been discussed in Vempaty et al. (2010).
4.2.2 Eliminative Constraints (CE)
Unlike the licensing constraints (CL) discussed in section 4.2.1 that are used to construct a CG,
Eliminative constraint (CE) are used by the parser to get the solution parse by eliminating the arcs
in the CG that violate these constraints. There are 4 such constraints:
47
i. For each of the mandatory karakas in a demand frame for each head, there should be
exactly one outgoing edge labeled by the karaka from the head,
ii. For each of the desirable or optional karakas in a demand frame for each head, there
should be at most one outgoing edge labeled by the karaka from the head,
iii. There should be exactly one incoming arc to a child.
iv. There should be no cycles between two nodes.
These constraints ensure that the final parse is a tree. But more than this they connect with the
H-constraints (via a Constraint graph). If that were not so, then the solutions obtained would be
spurious. Consider again sentence 4.1 repeated below as 4.19
(4.19) baccaa haatha se kelaa khaataa hei
’child’ ‘hand’ INST ‘banana’ ‘eats’ ‘is’
‘The child eats the banana with his hand.’
Figure 4.21 shows the constraint graph (CG) for sentence 4.19 and shows all the potential
candidate nodes that the verb (khaa) demands. The demand frame required to form the CG in
Figure 4.21 is shown in Table 4.20. A parse is a sub-graph of the CG formed, containing all the
nodes of the CG and satisfying CE.
Figure 4.21. CG for example 4.19.
karaka vibhakti Presence
karta (k1) ne mandatory
karma (k2) ko or 0 mandatory
karana (k3) se or dvaara optional
Table 4.20. Basic demand frame for khaa ‘eat’
48
Although, one can easily derive possible trees from the CG shown above, some of them as
given in Figure 4.22 are clearly wrong parses and show the disconnect between the lexical
demands and the tree well-formedness assumption. In effect, CE ensures that the demand frame
requirements align with a candidate source element during derivation. The correct solutions for
the above sentence are shown in Figure 4.23.
Figure 4.22. Possible wrong trees for example 4.19
(Notice the multiple presence of same labels that make the parses wrong)
Figure 4.23. Solution parses for example 4.19.
Figure 4.23 shows that there can be multiple parses which can satisfy these CE. This indicates the
ambiguity in the sentence if only the limited knowledge base is considered. Stated another way,
H-constraints and the meta-constraints that help in constructing the CG are insufficient to restrict
multiple analysis of a given sentence and that more knowledge (semantics, other preferences,
etc.) is required to curtail the ambiguities. For all such constructions there are multiple parses.
4.2.3 Preferential Constraints (CP)
Soft-constraints (S-constraints) are those constraints which are used in a language as preferences.
They reflect preferences that a language has towards various linguistic phenomena. These
49
preferential constraints are used to prioritize the parses and select the best parse. Some of the S-
constraints that have been tried out are:
a) Order of the arguments,
b) Relative position of arguments with respect to the verb,
c) Agreement,
d) Distance from head,
e) General graph properties.
Figure 4.24 shows how order of arguments can be used as a prioritization strategy. Here the two
parses of sentence 4.19 earlier shown in Figure 4.23 are prioritized using the order of ‘k1’ and
‘k2’, where (k1, k2) is preferred over (k2, k1). Such S-constraints can be used for ranking by
penalizing a parse for the constraint that it violates and finally choosing the parse that gets least
penalized. This strategy is similar to the one used in weighted constraint grammar parsing
(WCGP) (Schröder 2002; Foth and Menzel, 2006), and Optimality Theory (Prince and
Smolensky, 1993; Aissen, 1999).
Figure 4.24. Prioritizing of example 4.19 solution parses.
Prioritization based on the order of ‘k1’ and ‘k2’. (k1, k2) is preferred over (k2, k1)
The tree inside the rectangle is the correct parse.
50
The other way is to use these constraints as features, learn their associated weights from a
dependency treebank, use them to score the output parses and select the parse with the best score.
This is similar to the work on ranking in phrase structure parsing (Collins and Koo, 2005; Shen et
al., 2003) and graph-based parsing (McDonald et al., 2005a). We use this latter strategy. We note
here that the constraint weights used in WCGP are determined by a grammar writer.
4.3 GH-CBP Framework
Figure 4.25 shows the overall parsing scheme of GH-CBP. The GH-CBP model MGH-CBP
comprises of a dependency grammar Γ, a set of parameters λ and a parsing algorithm h.
MGH-CBP = (Γ, λ, h) (I)
Γ maps an arbitrary sentence S to a constraint graph CG. CG constrains the space of
permissible dependency structures for a given sentence.
Γ = (Σ, R, C) (II)
C = {CL, CE , Cp} (III)
CL = {H-constraints, Meta-constraints} (IV)
CP = {S-constraints} (V)
Σ is the set of words, R is the label set, and CL is the set of licensing constraints that are used
to construct a CG. We saw them earlier in the form of H-constraints and the Meta-constraints. CE
is the set of eliminative constraints that are employed by the parsing algorithm. They are
discussed in the next section.
As seen in Section 4.3.2, λ(i, r, j) is the probability of relation r on arc i → j given a set of
preferential constraints Cp. This probability is automatically learnt from a dependency treebank.
We elaborate on this further in section 4.3.2.
λ(i, r, j) = P(ri,j | CP) (VI)
51
Figure 4.25. Schematic design of GH-CBP.
4.3.1 Parsing as Constraint Satisfaction
In a constraint dependency approach, parsing is defined as a constraint satisfaction problem. For a
fully defined constraint satisfaction problem, we need to specify the variables, their domains and
the set of constraints that need to be satisfied:
(1) Set of variables: X=x0, x1, x2…xe represents an arc in the constraint graph (CG), where e ≤
k(n2 – n) and n is the number of vertices, k is the number of dependency labels.
(2) Domain of variables: 0 or 1.
(3) Set of eliminative constraints: CE
In GH-CBP constraint satisfaction is done using bipartite graph matching and Integer
programming (IP) (Bharati et al., 1995a, 1995b, 1993). This is done by associating a variable with
every edge in the CG (say, xikj for an edge from node i to node j labeled by k). A parse is an
52
assignment of 1 to those variables whose corresponding arcs are in the parse sub-graph, and 0 to
those that are not. The cost function to be minimized is the sum of all variables. The following IP
equalities and inequalities ensure the four constraints stated above:
For each head ‘i’, for each of its mandatory karakas ‘k’
Mi;k : ∑j xi;k;j = 1 (VII)
For each head ‘i’, for each of its optional or desirable karakas ‘k’
Oi;k : ∑j xi;k;j ≤ 1 (VIII)
For each of the child ‘j’
Sj : ∑i;k xi;k;j = 1 (IX)
For each head ‘i’ and its potential child ‘j’
Ci*j: xi*j + xj*i = 1 (X)
Note that Mi;k stands for the equation formed, given a head ‘i’ (the verb khaa in Ex. 4.18) and
karaka ‘k’. Thus, there will be as many equations as combinations of ‘i’ and ‘k’. Also, Ci*j is
formed even if the relation between node ‘i’ and node ‘j’ is indirect.
4.3.2 Prioritization
It was clear from section 4.2.1 and 4.2.2 that the core parser that uses CL and CE can produce
multiple parses. We noted earlier that we will use CP (or S-constraints) as features, associate
weight with them, use them to score the output parses and select the parse with the best score.
The score of a dependency parse tree t=(V, A) in most of the graph-based parsing systems
(Kubler et al., 2009) is
Score(t) = Score(V, A) ∈ R (XI)
where, V and A are the set of vertices and arcs. This score signifies how likely it is that a
particular tree is the correct analysis of a sentence S. Many systems assume the above score to
factor through the scores of subgraph of t. Thus, the above score becomes
53
Score(t) = Σα ∈ αt λα (XII)
where, α is the subgraph, αt is the relevant set of subgraph in t and λ is a real valued
parameter. If one follows the arc-factored model for scoring a dependency tree (Kubler et al.,
2009) like we do, the above score becomes:
Score(t) = Σ (i, r, j) ∈ A λ(i, r, j) (XIII)
In (XIII) the score is parameterized over the arcs of the dependency tree. Since we are
interested in using this scoring function for ranking, our ranking function (R) should select the
parse that has the maximum score amongst all the parses (Φ) produced by the core parser.
R(Φ, λ) = argmax(t=V,A) ∈T Score (t) = argmax(t=V,A) ∈ T Σ (i, r, j) ∈ A λ(i, r, j) (XIV)
One of the most basic methods that we tried for ranking was based on voting. In this method
that we call ‘Rank-Voting’, λ(i, r, j) correspond to the count of relation r on arc i → j amongst all
the parses (Φ).
In all other methods λ(i, r, j) represents probabilities, therefore it is more natural to multiply the
arc parameters instead of summing them.
R(Φ, λ) = argmax(t=V,A) ∈ T Score (t) = argmax(t=V,A) ∈ T Π (i, r, j) ∈ A λ(i, r, j) (XV)
Here, λ(i, r, j) is simply the probability of relation r on arc i → j given some preferential
constraints (CP). This probability is currently obtained using MaxEnt model (Ratnaparkhi. 1998).
So,
λ(i, r, j) = P(ri,j | CP) (XVI)
If A denotes the set of all dependency labels and B denotes the set of all S-constraints then
MaxEnt ensures that p maximizes the entropy
H(p) = - Σ x ∈ E p(x) log p(x) (XVII)
54
Where x = (a,b), a ∈ A, b ∈ B and E = A x B. Note that, since we are not parsing but
prioritizing, unlike the arc-factored model where the feature function associated with the arc
parameter consists only of the features associated with that specific arc, our features can have
wider context. Figure 4.26 shows the context over which various S-constraints can be specified
(S-constraints were described in section 4.3). These S-constraints get reflected as features that are
used in MaxEnt. The features for which the model gave the best performance are given below.
Note that the actual feature pool was much larger, and some features like that for agreement did
not get selected.
(1) Root, POS tag, Chunk tag, suffix of the current node and its parent
(2) Suffix of the grandparent, Conjoined suffix of current node and head
(3) Root, Chunk Tag, Suffix, Morph category of the 1st right sibling
(4) Suffix, Morph category of the 1st left sibling
(5) Dependency relations between the first two, right and left sibling and the head
(6) Dependency relation between the grandparent and head
(7) Dependency relation between the current node and its child
(8) A binary feature to signify if a k1 already exist for this head
(9) A binary feature to signify if a k2 already exist for this head
(10) Distance from a non-finite head
Figure 4.26. Context over which S-constraints can be specified. Node i is the parent of node
j. l-s1 corresponds to 1st left sibling, r-s1 corresponds to 1
st right sibling, gp is grandparent of
node j, ch is child of node j. r1-r6 are dependency relations
55
The ranking function shown in (XVI) can differ based on how one gets the probability of relation
on arc i → j. Since we are ranking labeled dependency tree, the first way (as shown in XVI) is to
use the probability of the label r in the labeled dependency parse. But we can also use the
probability of the label on arc i → j as predicted by the MaxEnt model. Considering this, the third
obvious way is to take the weighted average the two method. (XVIII) and (XIX) show these other
two options.
λ(i, r, j) = P(rmi,j | Cp) (XVIII)
where, rm is the relation on arc i → j predicted by the model.
λ(i, r, j) = ( P(ri,j | Cp) + P(rmi,j | Cp) )/2 (XIX)
When the ranker uses (XVI) we call it ‘Ranking with Parser Relation probability’ (Rank-
PR), the other two are called ‘Ranking with Model Relation probability’ (Rank-MR) and
‘Ranking with Weighted Relation probability’ (Rank-WR).
In similar vein, λ(i, r, j) in (XV) can correspond to the probability of attachment i → j being a
valid attachment. In this formulation, the relation r becomes inconsequential. This can be
ascertained by using MaxEnt as a binary classifier. The ranking model based on this parameter is
called ‘Ranking with Attachment’ (Rank-Attach).
Further, we can combine Rank-Attach with the models described based on (XVI), (XVIII)
and (XIX). In such a strategy we first select the k-best parses using Rank-Attach and then re-rank
them using Rank-PR, Rank-MR or Rank-WR. This order can also be reversed.
4.3.3 Fail-Safe Parse
It is also possible that the core parser might not be able to give any parse for a sentence. In such a
case, some simple heuristics are used in an attempt to produce a reasonable parse. One such
heuristic (shown in Figure 4.27) is to attach all the nominals to the first finite verb occurring to
the right with an underspecified ‘vmod’5 relation.
5 verb modifier
56
(4.20) raama ghara gayaa aura usane so gayaa
‘Ram’ ‘home’ ‘went’ ‘and’ ‘he-ERG’ ‘sleep’ ‘went’
‘Ram went home and he slept’
Figure 4.27. Failsafe parse for example 4.20
4.3.4 Algorithm
GH-CBP runs in two stages. Parsing during both the stages is identical except for the CG formed
during the two stages. CG construction in both the stages is essentially same except that the CG in
the second stage has very few nodes. Also, the heads in the second stage are different from the 1st
stage. Based on different linguistic demands there are 4 types of nodes:
1. Nodes that look for their children, for example verb, coordinating conjunctions
2. Nodes that look for their parent, for example, adverbs, adjectives, relative clause markers,
etc.
3. Nodes that look for both parent and children, for example, subordinating conjunctions,
non-finite verbs, and
4. Nodes that do not look for either parent or children, for example, nouns.
Below we give the algorithm for GH-CBP and for the construction of a CG.
GH_CBP(S, Γ, λ)
CG = Construct_CG (S, Γ)
Φ = Constraint_Statisfaction (CG)
If(Φ)
P = Rank (Φ, λ)
57
Else
P = Fail_Safe(CG)
Construct_CG(S, Γ)
CG = S
foreach w ∈ S
If(IsHead(w))
Frames = H-constraint(w)
foreach Frame ∈ Frames
Frame = Demand_status_transformation(Frame)
CG = Find_children(w, Frame, CG, S)
CG = Look_ahead(w, Frame, CG, S)
CG = Revision(CG)
Currently, only the final parses are ranked. On an average the parser gives around 50 parse
for each sentence. We found that the oracle accuracy with a limit of 300 parse is almost
equivalent to considering all the parse outputs. The efficiency of the parser is dependent primarily
on the following two factors:
a) Number of head and demand frame combinations
b) Search space for a head
It is easy to see that there is a tradeoff between efficiency and performance.
4.4 Results
We evaluated GH-CBP6 for Hindi and Telugu. We used the treebank data from ICON2010 tools
contest (Husain et al., 2010). For Hindi, the training set had 3000 sentences, development and test
set had 500 and 321 sentences respectively. The coarse-grained tagset (see APPENDIX I) was
used during evaluation. For Telugu, the training set had 1500 sentences, development and test set
had 150 sentences each. Performance is shown in terms of unlabeled attachments (UAS), labeled
6 GH-CBP (version 1.5)
58
(LA) and labeled attachment (LAS) accuracy7. In Table 4.21, GH-CBP’’ gives the oracle score
when first 300 unprioritized parses are considered. The oracle parse for a sentence is the best
available parse amongst all the parses of that sentence. It is obtained by selecting the parse closest
to the gold parse. The oracle accuracy gives the upper-bound of the parser accuracy and gives
some idea about its coverage.
Hindi Telugu
UAS LAS LA UAS LAS LA
GH-CBP’’
(Oracle parse)
88.50 79.12 80.96 84.14 65.33 66.60
Table 4.21. Oracle scores with GH-CBP for unprioritized parses (k=300)
Hindi Telugu
UAS LAS LA UAS LAS LA
Intra-clausal 88.68 77.27 79.54 85.11 57.92 59.87
Inter-clausal 87.78 86.23 86.2 82.09 79.63 79.63
Table 4.22. Intra-clausal and inter-clausal relation results
We see that the oracle UAS and LAS shown in table 4.21 are very good and it shows the
coverage of GH-CBP is high. Table 4.22 shows that the intra-clausal LAS in both Hindi and
Telugu is less than the inter-clausal LAS. We will see in chapter 6 that this is mainly because of
unavailability of frames for various heads.
Table 4.23 shows the accuracies after prioritization. The best parse accuracies are less than
the oracle scores. This is because of ranking errors. Rank-WR gives the best accuracy for Hindi.
As discussed in the section 4.3.2, in Rank-WR we score the parse using the average of parse
relation probability and the model relation probability. Use of Rank-Attach alone or as a
reranking strategy over Rank-WR did not improve the accuracy. The average score of Rank-WR
and Rank-Attach however gave the best accuracy for Telugu. One reason why we think Rank-
Attach fails to outperform Rank-WR is that the Rank-WR model is richer than Rank-Attach.
Considering this, it is interesting to notice that for Telugu, Rank-Attach does help, in fact, the best
model for Telugu involves Rank-Attach. We think this is mainly because of Telugu treebank
being smaller than Hindi. Additionally, owing to the lower average sentence length in Telugu, the
7 LAS/UAS/LA = percentage of words assigned correct head+label/head/label
59
total number of attachments and labels available for training MaxEnt is also less. In such a
scenario an underspecified model such as Rank-Attach seems like a better option than a rich
model like Rank-WR. We analyze prioritization results further in chapter 6.
Hindi Telugu
UAS LAS LA UAS LAS LA
Rank-Voting 83.96 66.12 69.45 81.18 55.60 57.72
Rank-PR 84.07 71.40 74.87 82.24 59.20 61.10
Rank-MR 83.54 67.01 70.51 81.40 56.03 58.56
Rank-WR 84.25 71.43 74.97 82.03 59.20 60.89
Rank-Attach 83.65 68.85 71.82 83.51 57.08 58.56
Rank-WR (k-best=100) +
Rank-Attach
83.79 69.24 72.25 83.51 57.08 58.56
Rank-WR (k-best=100) +
Rank-Attach (if > 1 parse with best score)
84.25 71.43 74.97 82.03 59.20 60.89
Rank-Attach (k-best=100) +
Rank-WR
84.21 71.40 74.94 82.03 59.20 60.89
Average (Rank-WR + Rank-Attach ) 83.89 70.58 73.66 83.30 59.62 61.10
Table 4.23. Results after various prioritization strategies8
8 Other classification methods such as CRF, SVM were also tried out but MaxEnt gave us the best
accuracy. The scoring function with probability sum (XIV) gave us better result than probability product
(XV) for the development data and therefore (XIV) is used to report the results on test.
60
Chapter 5
5. Incorporating Insights from GH-CBP in Data Driven Dependency
Parsing
Data driven parsing for MoR-FWO languages (such as Czech, Hindi, Turkish, etc.) has not
reached the performance obtained for English (Nivre et al., 2007a; Hall et al., 2007; Husain 2009;
Tsafarty et al., 2010). This low performance can be broadly attributed to
(a) Non-configurational nature of these languages
(b) Inherent limitations in the parsing algorithms
(c) Less amount of annotated data
There have been many attempts to tackle (a) and (b). Some such recent works are (Nivre and
McDonald, 2008; Zhang and Clark, 2008; Nivre, 2009; Tsarfaty and Sima'an, 2008; Gadde et al.,
2010; Husain et al., 2009; Eryigit et al., 2008, Goldberg and Elhedad, 2009, Martins et al., 2009;
Koo and Collins, 2010). The work desribed in this chapter also addresses similar issues. We
introduce the insights from building GH-CBP in data-driven parsing and investigate its effects on
parser performance. This will be done in the following ways:
1. Incorporating targeted features during training (Section 5.3): Constraints that have
proven to be crucial for identifying various dependency relations in GH-CBP are identified
and used as appropriate features. These features can be broadly divided into four classes:
a. Morphological
b. Local morphosyntactic
c. Clausal
d. Minimal semantics
2. Linguistically constrained modularity (Section 5.4): This is done by using chunk and
clause as the basic parsing unit. Different ways are explored to incorporate chunks and
clauses during the parsing process. Broadly, the notion of a chunk or a clause can be used
61
during parsing as something that is fixed vs. as something that provides some extra
information. In other words, they can be treated either as hard constraint or as soft constraint.
3. Linguistically rich graph-based parsing (Section 5.5): In MSTParser (McDonald et al.,
2005b, a graph based data driven parser), a complete graph is used to extract a spanning tree
to get the final parse. In GH-CBP, a constraint graph (CG) is a structure that shows all
possible relations between heads and their children. The CG is used to get the output parse. In
this work, we make MSTParser use CG instead of a complete graph.
5.1 Parsers: Malt and MST
For the experiments reported in this chapter we use two data-driven parsers MaltParser (Nivre et
al., 2007a), and MSTParser (McDonald et al., 2005b).
Malt uses arc-eager parsing algorithm (Nivre, 2006). History-based feature models are used for
predicting the next parser action (Black et al., 1992). Support vector machines are used for map-
ping histories to parser actions (Kudo and Matsumoto, 2002). It uses graph transformation to
handle non-projective trees (Nivre and Nilsson, 2005a).
MSTParser uses an implementation of Chu-Liu-Edmonds (Chu and Liu, 1965; Edmonds,
1967) algorithm to find maximum spanning tree. It uses online large margin learning as the
learning algorithm (McDonald et al., 2005a). Both Malt and MST allow for arbitrary combination
of features as part of the feature model.
Unless otherwise specified, in all the experiments is this section, for MaltParser9, we use arc-
eager and SVM training. For MSTParser10
we use the non-projective algorithm, order=2 and
training k=5. The feature files given in APPENDIX IV and VI are used for Malt and MST
respectively. The other available parser settings in MaltParser and MSTParser were also
experimented with but they fared worse than the above settings.
5.2 Data
All the experiments are conducted on Hindi. We use the dependency treebank released as part of
the ICON2010 tools contest (Husain et al., 2010). The treebank data uses the CPG framework for
annotation (Begum et al., 2008a). The analysis of a sentence is a dependency tree with syntactico-
9 MaltParser (version 1.3.1) 10 MST Version 0.4b
62
semantic labels. The annotation scheme allows for non-projective trees and apart from
dependency annotation also has POS, chunk and morphological information. The training data
had 2,973 sentences, development and training had 543 sentences and 321 sentences respectively.
5.3 Incorporating Targeted Features During Training
In chapter 3 we saw constraints that were used to account for different types of relations in Hindi,
Telugu and Bangla. These constraints incorporated important grammatical notions from CPG in
GH-CBP. These constraints can be used as linguistic features in data-driven parsing. In this
section we will discuss all such features that help improve the parsing accuracy.
5.3.1 Morphological Features
Throughout the previous chapter we noted the importance of various morphological features in
the task of dependency parsing. Incorporating these features is the obvious first step. Morph
output has the following information
a) Root: Root form of the word
b) Category: Course grained POS·
c) Gender: Masculine/Feminine/Neuter
d) Number: Singular/Plural
e) Person: First/Second/Third person
f) Case: Oblique/Direct case
g) Vibhakti: Vibhakti of the word
Take raama in example 5.1, its morph information comprises of root = ‘raama’, category =
‘noun’, gender= ‘masculine’, number = ‘singular’, person = ‘third’, case = ‘direct’, vibhakti = ‘0’.
Similarly, khaayaa ‘ate’ has the following morph information. root = ‘khaa’, category = ‘verb’
gender =‘masculine’, numer = ‘singular’, person = ‘third’, case = ‘direct’, vibhakti = ‘yaa’.
Through a series of experiments, the most crucial morph features were selected. Root, case and
vibhakti turn out to be the most important features. Note that the agreement features (such as
gender, number and person) were not selected in the best setting. This anomaly is discussed in the
next chapter.
63
(5.1) raama ne ek seba khaayaa
’Ram’ ERG ‘one’ ‘apple’ ‘ate’
‘Ram ate an apple’
5.3.2 Local Morphosyntactic Features
Local morphosyntactic features correspond to all the parsing relevant local linguistic features.
This we saw earlier was captured via the notion of chunk where the nominal and verbal vibhakti
were percolated to the head of the chunk. The features that are used to encode local
morphosyntax are:
a) Type of the chunk
b) Head/non-head of the chunk
c) Chunk boundary information
d) Distance to the end of the chunk
e) Vibhakti computation for the head of the chunk
In example 5.2 there are two noun chunks and one verb chunk. raama and seba are the heads
of the noun chunks. khaayaa is the head of the verb chunk. We follow standard IOB11
notation
for chunk boundary. raama, eka and khaayaa are at the beginning (B) of their respective chunks.
ne and seba are inside (I) their respective chunks. raama is at distance 1 from the end of the
chunk and ne is at a distance 0 from the end of the chunk. Given a chunk and morph feature like
vibhakti for individual words inside that chunk, relevant features for the head of the chunk can be
computed automatically. The vibhakti feature of the head can, for example, represent the
postposition/case-marking in the case of noun chunk, or it may represent the tense, aspect and
modality (TAM) information in the case of verb chunks. Take (5.2) as a case in point:
(5.2) (raama/NNP ne/PREP)_NP (seba/NN)_NP (khaa/VFM liyaa/VAUX)_VGF
‘Ram’ ERG ‘apple’ ‘eat’ PRFT
‘Ram ate an apple’
The vibhakti computation for khaa, which is the head of the VGF chunk, will be ‘0_yaa’ and
is formed by concatenating the vibhakti of the main verb khaa with that of its auxiliary liyaa.
11 Inside, Outside, Beginning of the chunk
64
Similarly, the vibhakti computation for raama, which is head of the NP chunk, will be ‘ne’. This
feature turns out to be very important. This, as was discussed in chapter 2, is because in Hindi
(and many other Indian languages) there is a direct linguistic correlation between the verb
vibhakti and the noun vibhakti (case and postpositions) that appears on k1 or k2. This was again
mentioned using one of the meta-constraints in section 4.2.1.2.2.. In (5.2), for example, khaa
liyaa together, provide the past perfective aspect for the verb khaanaa ‘to eat’. Since, Hindi is
split ergative, the subject of the transitive verb takes an ergative case marker when the verb is past
perfective. Similar correlation between the case markers and TAM have been discussed earlier.
5.3.3 Clausal Features
Clause was motivated as a linguistically meaningful minimal parsing unit in chapter 3. Clausal
features were used to incorporate the notion of tiganta. We posited the notion of a clause to
demarcate the scope of a finite verb. It is evident from our previous discussions that most of the
dependents of words in a clause appear inside the same clause; in other words the dependencies
of the words in a clause are mostly localized within the clause boundary (more on this in section
5.4.2).
In the dependency parsing task, a parser has to disambiguate between several words in the
sentence to find the parent/child of a particular word. Clausal features can help the parser to
reduce the search space when it is trying to do this. The search space of the parser can be reduced
by a large extent if we solve a relatively small problem of identifying the clauses. Interestingly, it
has been shown recently that most of the non-projective cases in Hindi are inter-clausal (Mannem
et al., 2009a). Identifying clausal boundaries, therefore, should prove to be helpful in parsing non-
projective structures. The same holds true for many long-distance dependencies. For this set of
experiment, two clausal features were used
a) Clause boundary information,
b) Clausal head information
These clausal features are obtained using an intra-clausal (1st stage) parser. A 1
st stage parser
only parses individual clauses and attaches these clausal sub-trees to a artificial _ROOT_ (more
about the 1st stage parser in section 5.4.2). A 1
st stage parse is similar to the one that were
discussed in the previous chapter. Once such a parse is obtained the subtree span provides the
clause boundary information and the subtree head provides the head information.
65
5.3.4 Minimal semantics Features
The three types of features namely, morphological, morphosyntactic and clausal that we have
discussed until now captured different linguistic realities. So, for example, morph along with
chunk features help in identifying the verbal arguments and clausal feature constrains the search
space, thus helping in identifying in the right attachment site, etc. But for some sentence, none of
these features might prove to be helpful, sentence (5.3) is a case in point
(5.3) raama seba khaataa hai
‘Ram’ ‘apple’ ‘eat’ ‘is’
‘Ram eats apple’
In (5.3), both raama and seba have ø post-position and therefore there are no explicit
morphosyntactic cues that tell us the appropriate relation between raama and khaata hai and seba
and khaata hai. Compare this with (5.4), where seba is followed by a postposition that can help
us identify the object/theme of the event ‘eat’,
(5.4) raama seba ko khaataa hai
‘Ram’ ‘apple’ ‘ACC’ ‘eat’ ‘is’
‘Ram eats apple’
It should also be noted that in (5.3) the agreement features don’t help either. In (5.3) both the
nominals raama and seba are masculine. Neither does word order help, as Hindi is a free word
order language and therefore a sentence like seba raama khaataa hai also conveys the same
meaning. In such cases where surface based information fails, semantic features assist in
disambiguating the relations, thus aiding parsing. In (5.3), for example, the information that
raama is a ‘human/animate being‘ or that seba is an ‘inanimate being’ will prove to be crucial. In
fact, in (5.3), correct parsing is only possible if this semantic information is available. Such
semantic features capture the notion of yogyata discussed in chapter 2.
All semantic features don't contribute in identifying the dependency relations. Similarly, all
the dependency relations are not benefited by semantic features. So an optimal set of semantic
features should be used based on their positive contribution in dependency parsing. As a first step
in this direction, we use the following semantic features.
66
a) Human
b) Non-human
c) Inanimate
d) Time
e) Place
f) Abstract
g) Rest
These semantic features have been manually marked in the treebank. For this experiment we
do not get these features automatically, rather we use the gold features directly. In that sense, this
experiment is more illustrative than practical.
5.3.5 Results
Figure 5.1 shows the relative importance of these features over the baseline12
LAS of MSTParser.
It is clear that all the features discussed earlier improve the performance significantly. Similar
improvements have also been obtained for MaltParser. We relook at these results again in the
next chapter and make some observations. Note that in Figure 5.1, clausal and semantic features
presume the presence of L-Morph features.
Figure 5.1. Improvement of different features over MST baseline.
Morph: Morphological, L-Morph: Local Morphosyntax, Sem: Minimal Semantics
12 The Baseline is obtained using only the Basic Unigram and Bigram features (APPENDIX VI)
67
5.4 Linguistically constrained modularity
In this section we will explore different methods to use chunk and clause as minimal parsing units
during data-driven dependency parsing.
5.4.1 Chunk based parsing
As mentioned in the previous chapter, the notion of chunk can help distinguish local dependency
relations with global dependencies. We noticed that most of these local intra-chunk relations did
not affect the overall dependency structure. This lead us to modularize parsing into identifying
inter-chunk relations first and then to identify intra-chunk relations. In this section we illustrate
two methods to incorporate this modularity during data-driven dependency parsing.
5.4.1.1 Chunk as Hard Constraint (H-C)
In this method the dependency parser first marks relation between elements inside a chunk. The
inter-chunk relations are then identified forming the dependency tree for the sentence. During
inter-chunk parsing, intra-chunk context are used as features but are not modified. Hence in this
setup chunk in treated as a hard constraint. This is shown in figure 5.2.
The intra-chunk dependency relations are relatively easy to predict than inter-chunk relations.
In case of Hindi these intra-chunk dependency labels can be predicted from POS tags using a
small set of rules. The labeled attachment score of this system over the input data using gold
standard POS and chunk tags is 95.32%.
68
Figure 5.2. Chunk as Hard Constraint
5.4.1.2 Chunk as Soft Constraint (S-C)
In this method chunk information is used as features during parsing. This means that during
parsing both local (intra-chunk) and global (inter-chunk) relations are identified together. Hence
in this setup chunk in treated as a soft constraint. This is shown in Figure 5.3. We discussed this
earlier in section 5.3.2. The following chunk features were used
a) Type of the chunk
b) Head/non-head of the chunk
c) Chunk boundary information
d) Distance to the end of the chunk
e) Vibhakti computation for the head of the chunk
69
Figure 5.3. Chunk as Soft Constraint.
5.4.1.3 Results
Table 5.1 shows the results for both the methods and the baseline. The baseline setting does use
the notion of chunk, it only uses POS and morphological features. Note that unlike the other
results mentioned in this proposal, these results are for complete parse. Experiments were
conducted using 1165 sentences from the Hindi dependency treebank (Begum et al., 2008a).
Average length of these sentences is 17.4 words/sentence and 8.6 chunks/sentence. We trained
both the parsers on 1000 sentences and tested them on 165 sentences. These results are fleshed
out in the next section.
Malt MST + MaxEnt
UAS LAS LA UAS LAS LA
Baseline 90.4 81.7 84.1 90.0 80.9 83.9
H-C 92.4 84.4 86.3 92.7 84.0 86.2
S-C 91.8 84.0 86.2 92.0 81.8 83.8
Table 5.1.Results for chunk modularity.
MST + MaxEnt: MST unlabelled trees with MaxEnt labeler
H-C: Chunks as Hard constraints, S-C: Chunks as Soft constraints
70
5.4.2 Clausal Parsing
Clause as a minimal parsing unit during parsing was motivated in chapter 3 using the notion of
tiganta (or verb vibhakti). In most cases, a clause demarcates the scope of a finite verb. Such a
definition of clause in data-driven parsing brings out some interesting correlations between inter-
and intra-clausal relations with relation type, depth, arc length and non-projectivity. Previous
work on Malt and MST have shown that these properties have direct repercussion on parser
accuracy. In this section we first correlate these properties with the notion of clause and then
explore two methods to incorporate it during the parsing process.
We first note that certain dependency relations are more likely to occur between the elements
inside a clause and a different set of relations are more likely for showing dependencies across
clauses. We also note that the notion of clause can be correlated with short distance and long
distance dependencies. Figure 5.4 shows the distribution of dependency labels with respect to
clause type (intra-clausal vs. inter-clausal) in the Hyderabad dependency treebank (Begum et al.,
2008a; Husain et al., 2010). For ease of exposition, figure 5.4 only shows the labels with
considerable coverage, together amounting to 93% of all dependency label occurrences. We can
see clearly that many labels like k1, r6, etc. are overwhelmingly intra-clausal relations, while
others like nmod-relc, ccof, etc. have an inter-clausal bias.
Figure 5.4. Dependency label distribution.
Figure 5.5 shows that short-distance dependencies are mostly intra-clausal, whereas long-distance
dependencies tend to be inter-clausal. It is clear from Figure 5.4 and 5.5 that there is a clear
correlation between labels and relation type on one hand and arc length and relation type on the
other. Further, there is a correlation between inter- vs. intra-clausal relations with respect to depth
of relations as well. Figure 5.6 shows that low depth dependencies are both inter-clausal (in case
71
of complex sentences involving coordination, relative clauses, embeddings, etc.) and intra-clausal
(simple sentences). It also shows that the percentage of inter-clausal relations decreases with
increase in depth.
Figure 5.5. Arc length and relation type
Figure 5.6. Depth and relation type
Finally, there is a correlation between clause and non-projectivity: 70% of the non-projective
relations are inter-clausal (Mannem et al., 2009a).
Properties such as relation type, arc length, depth, and non-projectivity are known to have
specific effect on errors in data-driven dependency parsing (McDonald and Nivre, 2007).
Therefore, it is worth exploring the effect of clause (when treated as a minimal unit) on
dependency parsing accuracy. We will see in this section that this amounts to parsing individual
clauses separately. As described in chapter 3, for all the experiments, the following definition of
clause is used:
72
‘A clause is a group of words containing a single finite verb and its dependents’.
More precisely, let T be the complete dependency tree of a sentence, and let G be a clausal
subgraph of T. Then an arc x → y in G is a valid arc, if (a) x is a finite verb; (b) y is not a finite
verb; (c) there is no z such that y → z, where z is a finite verb and y is a conjunct.
5.4.2.1 2-stage parsing
In section 5.3.3 we saw that clausal feature can be incorporated using features during parsing. In
this section we explore another method to bring in clausal information. The basic idea here is
essentially the same as constraint-based two-stage parsing discussed in chapter 4. We will first try
to parse intra-clausal relations and then parse inter-clausal relations. We will explore two ways of
doing this
a) Use the 1st stage output as something that is fixed, i.e. as Hard constraint. This is exactly
how 1st stage is treated in GH-CBP
b) Use the 1st stage output as Soft constraint. This means that the 1
st stage relations can in
principle be changed during 2nd
stage parsing.
Since the 1st stage parser aims to parse only clauses, the original treebank needs to be
modified to prepare the training data. We introduce a special dummy node named _ROOT_
which becomes the head of the sentence. All the clauses are connected to this dummy node with a
dummy relation. In effect we remove all the inter-clausal relations. The steps to do this
conversion are:
a) Add a dummy _ROOT_ node to the gold standard tree.
b) Find all sub-trees that have ccof13
or nmod__relc14
relation with its parent.
c) Find all subtrees where a relation exists between two VGF (finite verb chunk)
d) Attach those sub-trees and the respective parents to the new node _ROOT_, with a dummy
relation
The input to the 2nd
stage for all the method can be defined more precisely,
13 Conjunct relation (only finite clause are considered here) 14 noun modifier of the type relative clause
73
Let T be the complete tree that should be output by the 2nd
stage parser and let G be the
subgraph of T that is input to the second stage. Then G should satisfy the following constraint: if
the arc x → y is in G, then, for every z such that y → z is in T, y → z is also in G.
In other words, if an arc is included in the 1st stage partial parse, the complete subtree under the
dependent must also be included. Unless this constraint is satisfied, there are trees that the
second-stage parser (discussed in 5.4.2.2) cannot construct.
At the end of the conversion, the parses in the treebank are converted to ‘1st stage parses’, i.e.,
parse trees which one would get at the end of 1st stage. The settings while training MST/Malt for
stage 1 have already been discussed in Section 5.1.
(a) Original Gold input (b) 1st stage converted tree
Figure 5.7
5.4.2.2 2-stage parsing with hard constraints (2-Hard)
The partial parse obtained from 1st stage becomes the input to the 2nd
stage. In the 2nd
stage these
partial subtrees are related using appropriate relations. We can perform this two stage parsing in
two ways.
(a) By treating all the partial subtrees as a single node in the second stage, or
(b) By allowing the parser to accept partial trees as input.
5.4.2.2.1 Strategy I (2-Hard-S1)
For (a), the training data for 2nd
stage is obtained by converting the gold parse to suit 2nd stage
needs. Since we know that after 1st stage all the parsed clauses are attached to _ROOT_ (Figure
74
5.7b), we replace those sub-trees with their respective heads. We learn the relations between them
in the 2nd
stage. This is clearly depicted in Figure 5.8. The head of each sub-tree is a place holder
for the whole clause we parsed in 1st stage. Though we are only taking the head of each subtree,
we provide the features that are characteristic of the sub-tree, thereby reflecting the whole sub-
tree using that single node. This helps us to make the parser learn only the inter-clausal relations.
Figure 5.8. Stage2 training input. Partial trees converted into a single node.
Note here that we make an assumption that the inter-clausal relations exist only between the
heads (in most cases, a finite verb) of various clauses or in case of multiple conjuncts between
conjuncts themselves. There are exceptions to this (for example, relative clause construction).
This means that constructions like relative clause will have to handled separately.
Taking the head of a 1st stage parsed sub-tree as a single node that is representative of this
entire sub-tree requires using extra features while training for 2nd
stage. To do this, one must
judiciously isolate properties of the root as well as of subtree internal nodes. The features can be
broadly summarized as follows:
Structural features of the sub-tree, such as width, depth, branching, total no. of nodes, etc.,
and
The characteristics of the nodes (including the root) such as morph features, POS/Chunk tags,
arc label, domination (parent, grandparent, great-grandparent) feature, sibling features,
valency, etc.
75
For the present set of experiments we decided to use the same features15
as that of integrated
parsing. This has been done so as to make the comparison between the performances of 2-stage
parsing and integrated parsing unbiased. Figure 5.9 shows this setup.
Figure 5.9. Strategy I (2-Hard-S1). Input to 2nd
stage is a partial tree converted into a single node.
15
This will include POS/Chunk tags, morph features, suffix info., etc.
76
Figure 5.10. Strategy II (2-Hard-S2). Input to 2nd
stage is a partial parse
5.4.2.2.2 Strategy II (2-Hard-S2)
In this strategy, the 2nd
stage MaltParser takes as input partial 1st stage trees and establishes
relationships between clauses (and conjunctions) (Figure 5.10). The 1st stage predictions are
mutually exclusive of the 2nd
stage predictions and cannot be overridden in the 2nd
stage.
However, they can be used as features in the 2nd
stage predictions. This means that the 2nd
stage
MaltParser gets initialized with only those nodes that are attached to the _ROOT_ in the first
stage parse (cf. Figure 5.7b). Figure 5.11 below shows the initial configuration of 2nd
stage Malt
for sentence 5.5, the input will be the 1st stage parse shown in Figure 5.12a.
(5.5) mai ghar gayaa kyomki mai bimaar thaa
’I’ ’home’ ’went’ ’because’ ’I’ ’sick’ ‘was’
‘I went home because I was sick’
77
Figure 5.11. 2nd
stage initialization using the 1st stage parse shown in Fig. 4(a)
(a). 1st stage output (b). 2nd stage final parse
Figure 5.12. Parse output for sentence 5.5
The 1st stage and 2
nd stage parser will cater to different types of constructions. Recall the
constraint on the 2nd
stage input, we note that given such a constraint, a relative clause (though
being subordinate clause) cannot be handled in the 2nd
stage and will have to be handled
separately. We explain the problem of handling the relative clause in the 2nd
stage using sentence
(5.6).
(5.6) vaha vahaan waba puhuchaa jaba sab jaa chuke the
’He’ ’there’ ’when-REL’ ’reached’ ’when-COREL’ ’everyone’ ‘go’ ‘had’
‘He reached there when everyone had left’
78
Figure 5.13. Parse output and 2nd
stage initialization for sentence 5.6
Figure 5.13(a) shows the 1st stage output of a relative clause construction in a standard 2-stage
setup. Both the relative clause and the matrix clause are seen attached to the _ROOT_, the
analysis of these clauses is complete. In the 2nd
stage the relation between these two clauses is
established (Figure 5.13b). Recall that we initialize the 2nd
stage of 2-Hard-S2 with the children of
_ROOT_ which in this case is the finite verbs of the two clauses (Figure 5.13c). Now recall the
constraint on the input of the 2nd
stage in 2-Hard-S2; given this constraint the 2nd
stage can only
establish a relation between the two verbs and not, as is correct, between the relative clause verb
and a noun dependent on the matrix verb. The noun ‘waba’ is not present in the input buffer and
can never be considered as a head of ‘jaa’. Because of this reason, 2-Hard-S2 handles relative
clauses through a separate classifier after the 1st stage. This parse is then fed into the 2nd stage.
79
5.4.2.2.3 Handling relative clause constructions in 2-Hard-S2 and 2-Hard-S1
As discussed in the previous sections, both 2-Hard-S1 and 2-Hard-S2 are unable to parse relative
clause constructions. To handle such construction we use the following steps:
(a) Identify relative clauses at the end of 1st stage,
(b) Identify the head (noun) of the relative clause using Maximum Entropy (MaxEnt) model
(Ratnaparkhi. 1998),
(c) Attach the relative clause to its head
(d) Use this updated parse as input to 2nd
stage.
The identification of relative clauses at the end of 1st stage is rule based and depends on the
presence of lexical items such as jo, jaba, jisa, etc. These are relative pronouns. This system has
an accuracy of around 94%.
The MaxEnt model used to identify the correct head uses the following features.
a) Lexical Item of the NP's head,
b) POS Tag of the NP's head,
c) Direction of the NP from the relative clause (1, -1),
d) Distance of the NP from the relative clause verb (normalized to 4, 8, 12, 16, 20, 24),
e) Specific Cue: Presence of an item in the list ["taba","tyoM","vEsa","vaha","vahAM","vahAz"]
in either the lexical item or lemma of the word and its chunk members.
5.4.2.3 2-stage parsing with soft constraints (2-Soft)
We can, instead of treating the output of the first-stage parser as hard constraints for the 2nd
stage
parser, treat it as soft constraints by simply defining features over the arcs produced in the 1st
stage and making a complete parse in the 2nd
stage. Technically, this is the same technique that
Nivre and McDonald (2008) used to integrate Malt and MST, called guided parsing or parser
stacking. In 2-Soft we ‘guide’ Malt with a 1st stage parse by Malt. The additional features added
to the 2nd-stage parser during 2-Soft parsing encode the decisions by the 1st-stage parser
concerning potential arcs and labels considered by the 2nd
stage parser, in particular, arcs
involving the word currently on top of the stack and the word currently at the head of the input
buffer. For more details on the guide features for MaltParser, see Nivre and McDonald (2008).
80
Note again that, unlike the standard two-stage setup the 1st stage relations can now be overridden
during the 2nd
stage (because we are guiding), and unlike the standard guided parsing setup a
parser guides with only 1st stage relations. Unlike two-stage parsing, guided parsing parses
complete sentence twice. The results from one parser are used to extract features that guide the
second parser. In two-stage parsing, different components of a sentence are parsed in two stages.
5.4.2.4 Results
Table 5.2 shows the performance of the different parsers with 5-fold cross-validation on the Hindi
data described in section 5.2. 2-Hard-S216
and 2-Soft both perform better than the baseline. The
UAS, LAS and LA for the baseline were 88.82, 75.00 and 77.80 respectively. The difference with
the baseline and the two parsers were statistically significant. Significance is calculated using
McNemar’s test (p <= 0.05). These tests were made with MaltEval (Nilsson and Nivre, 2008).
UAS LAS LA
Baseline 88.82 75.02 77.80
2-Hard-S2 89.13 75.65 78.73
2-Soft 88.92 75.24 78.00
Table 5.2. Overall parsing accuracy (5-fold cross-validation)
Finer analysis of the cross-validation results shown in Table 5.2 brings out some interesting facts.
These are discussed in the next chapter.
5.5 Linguistically rich Graph-based parsing
In MSTParser (McDonald et al., 2005a,2005b, a graph based data driven parser), a complete
graph is used to extract a spanning tree during derivation. MSTParser’s learning model uses
large-margin algorithm, which optimizes the parameters of the model to maximize the score
margin between the correct dependency graph and all incorrect dependency graphs for every
sentence in a training set. The learning procedure is global. Unlike, MaltParser (and other
transition based systems, see Kubler et al., 2009) MSTParser considers limited history of parser
decisions during training. McDonald and Nivre (2007) characterize in detail the specific error
16
The feature file for the 2nd
stage parser in 2-Hard-S2 is given in APPENDIX V
81
patterns in MSTParser. Recent works such as Sagae and Lavie (2006), Nivre and McDonald
(2008), Zhang and Clark (2008), Koo and Collins (2010), have tried to improve the parsing
accuracy either by integrating the two parsers via stacking, etc. or by introducing better learning
models.
In this section we try to investigate if parsing accuracy using MSTParser can be improved by
providing it a constraint graph instead of a complete graph during the derivation step. While we
do not change the learning phase, it will be interesting to see what effect certain linguistic
knowledge alone can have on the overall accuracy. A constraint graph is formed by using
linguistic knowledge of a constraint based parsing system discussed in section 4.3. Through a
series of experiments we formulate the most optimal constraint graph that gives us the best
accuracy. These experiments show that some of the previous MSTParser errors can be corrected
consistently. It also shows the limitations of the proposed approach.
5.5.1 Constraint Graph
The constraint system discussed divides the task of parsing into intra-clausal and inter-clausal
stages. At each stage demand frames (mainly for verbs and conjunctions) for various heads are
used to construct a constraint graph. The parser currently uses close to 536 demand frames. The
constraint graph is then converted into an integer programming problem to get the parse at each
stage. Consider example (5.7)
(5.7) bacce ne kelaa khaayaa aura so gayaa
’child’ ERG ‘banana’ ‘eat’ ‘and’ ‘sleep’ ‘went’
‘The child ate the banana and slept’
Figure 5.14 shows the 1st stage and the 2nd stage Constraint graph (CG) for (1). Note that the
arcs in 1st stage CG are localized to individual clauses. The _ROOT_ node is required in order to
get the partial parse at the end of the 1st stage. Also note that in the 2nd stage only the inter-
clausal relations are considered (here finite verbs and a conjunct).
The CG for each sentence provides the linguistic knowledge that we will use in various
experiments in this paper. We can use this information in two ways:
a) Complete CG (or individual stage CG) can be used during the derivation
82
b) CG (complete or stage specific) can be used selectively to prune out certain arcs in the
complete graph while retaining other.
We note that although CG also provides arc labels, for all our experiments we are only concerned
with the attachment information. This is because the spanning tree extraction algorithm in
MSTParser uses unlabeled graph. MSTParser uses a separate classifier to label the trees.
Figure 5.14. Constraint graph for sentence 5.7.
5.5.2 Experimental Setup
All the experiments are conducted on Hindi. We use the dependency treebank described in
section 5.2. MSTParser described in section 5.1 was modified so that it can use CG during
derivation. Experiments were first conducted using training and development data. Once the
experiments were frozen, only then the test data was used.
5.5.3 Experiments
For an input S = w0, w1, ….wn, i.e. the set of all words in a sentence, let GS be the complete graph
and CGS be the constraint graph provided by the constraint parser. Let N = {w0, w1, ….wn} be the
set of vertices in GS. AG = N x N and ACG ⊆ N x N are the set of arcs in the two graphs. An arc
83
between wi and wj, shown as (wi,wj) signifies wi as the parent of wj. Let X be the set of all the
nodes which occur as a child in ACG. Also let C be the set of all vertices which are conjuncts, V
be the set of all vertices which are verbs, K be the set of all vertices which are nouns, P be the set
of all vertices that have a case-marker/post-position and J be the set of adjectives.
The set of arcs which will be pruned from the complete graph in experiment 1 is shown in
table 5.3. This means that all the arcs in G will be pruned except the once present in CG.
For y in X:
For x in S:
If ∃! (x,y) in ACG:
Remove (x,y) from AG
Table 5.3. Experiment 1 valid arcs
The parser in experiment 1 (E1) outperformed the baseline accuracy (more details in section 5).
Further analysis showed that the pruning based on E1, although useful, also had some negative
effect, i.e. it also prunes out many potentially valid arcs that would have been originally
considered by MSTParser. Through experiments 2-8 we explore if we can minimize such invalid
pruning. We do this by systematically considering parts of the CG and using only those parts for
pruning G.
Experiment 2 (Table 5.4, 1st row) begins with focusing on child nodes with post-positions.
Also incorporated are the conjunction heads. Since a CG is formed based on explicit linguistic
cues, it makes sense to base our decision where concrete information is available. Experiment 3
(Table 5.4, 2nd
row) uses similar conditions, except the constraint of nodes with post-position is
only on noun children. By doing this we are trying to explore the most appropriate information in
the CG.
84
For y in X:
For x in S:
If ∃! (x,y) in ACG:
Remove (x,y) from AG
If ∃ (x,y) in ACG and y∉P and x∉C:
Remove(x,y) from AG
For y in X:
For x in S:
If ∃! (x,y) in ACG:
Remove (x,y) from AG
If ∃ (x,y) in ACG and y∈K and y∉P and x∉C:
Remove(x,y) from AG
Table 5.4. Experiment 2 and 3 valid arcs
For y in X:
For x in S:
If ∃! (x,y) in ACG:
Remove (x,y) from AG
If ∃ (x,y) in ACG and y∈K and y∉P and x∈V:
Remove(x,y) from AG
Table 5.5. Experiment 4 valid arcs
Experiment 4 is a constrained version of Experiment 3. It is interesting to note that in experiment
4 we are trying to prune out invalid arcs related to the argument structure of a verb (x∈V). Using
CG only for verbal arguments with case-marker captures various verbal alternations manifested
via case-markings.
85
For y in X:
For x in S:
If ∃! (x,y) in ACG:
Remove (x,y) from AG
If ∃ (x,y) in ACG and y∈K and y∉P and x∈V:
If ∃! (z,y) in ACG and (z∈C or z∈J)
Remove(x,y) from AG
For y in X:
For x in S:
If ∃! (x,y) in ACG:
Remove (x,y) from AG
If ∃ (x,y) in ACG and y∈K and y∉P and x∉C:
If ∃! (z,y) in ACG and (z∈C or z∈J)
Remove(x,y) from AG
Table 5.6. Experiment 5 and 6 valid arcs
Experiment 5 and 6 extends experiments 4 and 3 respectively by introducing an exception
where a noun child y with no case-marker is considered only if there exists other potential
conjunction/adjectival head for y. Owing to the free-word order property of Hindi, identifying the
head of a noun with no case-marker is a rather difficult task. In spite of their availability many
robust generalizations (that help disambiguate relations with nouns with no case-markings) such
as agreement remain unexploited during training (Ambati et al., 2010). In this experiment
therefore, we are trying to ensure that the ambiguity of correct heads for nouns with no post-
position is not resolved by CG.
Experiments 2-6 only catered to verbal, conjunction or adjectival head. Experiment 7 and 8
extend 5 and 6 to handle nominal predicate heads. We note here that this information is not
obtained from the CG and is being treated as a heuristic rather than having some linguistic
validity. The constraint parser has very limited coverage for nominal predicates and therefore we
cannot rely on it for this kind of information. The heuristic considers a possibility of an
attachment between two consecutive nouns and does not remove such arcs from the G.
86
5.5.4 Results
Figure 5.15 shows the results for all the experiment. The baseline UAS was 88.66 and the best
result was obtained from experiment 8 with the UAS of 89.31. This is an increase of 0.65%.
There was also an increase of 0.45% in the LAS. All the results were statistically significant
using McNemar’s test. We will discuss these results further in chapter 6.
Figure 5.15. Unlabeled attachment accuracies
87
Chapter 6
6. Rounding up
In this chapter we show the error analysis of all the approaches discussed in the previous
chapters. We will also make necessary observations wherever necessary.
6.1 GH-CBP
In chapter 4 we illustrated the GH-CBP framework and described how many notions from CPG
got incorporated as part of its design. These design decisions have repercussions on different
aspects of the parser. Below we briefly discuss some
a) Generic Framework: Any grammar driven parser negotiates the need to balance generic
grammatical notions with language variability. This in turn reflects on its overall
performance. In GH-CBP the distinction between language specific constraints (H-
constraints) and other constraints in the form of meta-constraints, eliminative and preferential
constraints helps it cater to language variability on the one hand and generic grammatical
notions on the other hand, in a controlled way.
b) Efficiency: GH-CBP uses modularity and localization to achieve efficiency. The notion of
chunks and clauses leading to a layered parsing helps limit the search space from exploding.
The use of licensing constraints to form a constraint graph (instead of a constraint network)
constrains the space of permissible dependency structures for a given sentence.
c) Robustness: GH-CBP always gives some parse. The provision of producing a partial parse
along with a failsafe mechanism helps in handling unknown constructions.
d) Ambiguity resolution: GH-CBP uses statistical prioritization technique to rank the output
parses. The basic idea of scoring and getting the weights is derived from graph-based parsing
and labeling techniques.
88
Figure 6.1. Some intra-clausal non-projective structures
[NP: Noun chunk, CCP: Conjunction chunk, VGF: Finite verb chunk,
NN[GEN]: Noun chunk with a genitive marker, VGNN: Verbal noun chunk]
e) Non-projective structures: IP formulation allows for handling non-projective parsing (Riedel
and Clarke, 2006). GH-CBP handles most of the non-projective structures in Hindi (Mannem
et al., 2009a). Some such constructions include: (a) Relative-Corelative constructions, (b)
Extraposed relative clause constructions, (c) Paired connectives, (d) Shared argument
splitting the non-finite clause, etc. However, there are some non-projective sentences such as
(a) and (c) shown in Figure 6.1. that can pose problems. But the relevant feature required to
identify the correct attachment site in such cases is many a times semantic in nature.
f) Handling complex structures: Complex linguistic cues (local and global) can easily be
encoded as part of various constraints. Relevant constraints can disambiguate contextually
similar entities by tapping in hard to learn features.
6.1.1 Errors
The performance of GH-CBP is affected due to the following reasons:
a) Small lexicon (linguistic demands of various heads): One of the main reasons for the low
LAS is that the total number of linguistic demand frames that the parser currently uses is very
less. These demand frames comprise of verbs, conjuncts and predicative adjectives. GH-CBP
currently uses only around 536 frames for Hindi (Begum et al., 2008b) and 460 frames for
Telugu. Considering this, the comparatively low LAS is not surprising. Table 6.1 shows the
total number of unseen verbs and most frequent errors. For unseen verbs a default strategy of
using a prototypical frame is used; not surprisingly, this does not always work well. This is
89
reflected in Table 6.2 where the most frequent errors are in argument structure of a verb such
as k1, k2, and k7. Related to this is the problem of correctly identifying complex predicates
(CP) (Noun-Verb, Adjective-Verb). Many generalizations like passivization, agreement, etc.
work differently for CPs; therefore, identifying them becomes crucial. Automatic
identification of such predicates is a challenging task, as most diagnostics present in the
literature are behavioral (Butt, 1995; Verma, 1993). The parser currently handles some CPs
that have honaa ‘be’ and karnaa ‘do’ as light verbs. There has been some initial work
towards automatically inducing verb frames for Hindi and Telugu (Kolachina et al., 2010)
and also towards automatic identification of complex predicates in Hindi (Venkatapathy et
al., 2005; Begum et al., 2011). These works can prove to be very useful in the future.
Total verbs 650
Seen 591
Unseen 59
Error Percentage
k1 9.09
k2 15.55
k7 9.86
Table 6.1. Unseen verbs and some common argument structure errors in Hindi test
Intra-clausal relations UAS LAS LA
Verb arguments 94.16 82.13 84.08
Non verb arguments 83.10 72.33 74.93
Table 6.2. Intra-clausal performance in Hindi
Table 6.2 shows the performance of intra-clausal relations. We see that when compared to verb
arguments, accuracies of non-verbal relations are less. This is because number of frames for
predicative adjectives and nominals are less.
b) Morphological errors and ambiguous TAMs: A small portion of errors are caused when the
morphological analyzer gets the root form of a verb wrong. In such a case, the parser will
pick incorrect verb frame. Also, in Hindi, certain TAM (tense, aspect and modality) labels are
ambiguous and will affect correct application of demand status transformation.
90
c) Unhandled constructions: The parser doesn’t handle some constructions very well and gets
them wrong frequently when they appear in a sentence. These constructions are:
a. Apposition
b. Long distance intra-clausal coordination
Some constructions such as missing copula in Bangla and Telugu, and missing genitive case
markers in Telugu are still not handled by the parser.
6.1.1.1 Error analysis of Prioritization
The average number of output parses for each sentence is around 50. It was noticed that the
differences between these parses were very minimal and this makes ranking them a non-trivial
task. The average attachment difference among various parses with respect to the oracle parse of
sentence was 0.69, similarly the average label difference was 1.65. The closeness between parses
is quite expected from a constraint based parser whose output parses are only those that do not
violate any Licensing and Eliminative constraints. In other words most of the output parses are
linguistically very sound. Of course, linguistic soundness is only restricted to morpho-syntax and
does not consider any semantics. This is because the H-constraints do not incorporate any
semantics in the parser as of now. Considering this, the error analysis doesn’t throw up any big
surprises. The main reasons why the LAS suffers during prioritization can be attributed to:
a) Lack of explicit post-positions or presence of ambiguous one: Errors because of this, manifest
themselves at different places. This can lead to attachment errors. Few common cases are
finite and non-finite argument sharing, confusion between finite and non-finite argument,
adjectival participle, etc. Also, it was noted that the most frequent errors are for those
arguments of the verb, that have no postposition. Consequently, relations such as ‘k1’, ‘k2’,
‘k7’ and ‘vmod’ have very high confusion.
b) Multiple best score parses: It is possible that more than one parse finally gets the best score.
Out of 321 Hindi test sentences, 58 sentences had multiple best score parses. This is partly
caused due the above reason but it also reflects the accuracy of the labeler. Table 6.3 shows
the accuracy of the MaxEnt classifier for detecting labels and attachments in Hindi and
Telugu. As the accuracy of the labeler increases this problem will lessen. Currently, we select
only the first parse amongst all the parses with equal score.
91
Accuracy
Hindi Telugu
Attachments 80.13 80.80
Labels 87.19 76.78
Table 6.3. MaxEnt performance for attachment and label identification
6.2 Data-driven Parsing
The advantages and disadvantages of the two data-driven parsers that were used for various
experiments have been extensively explored (McDonald and Nivre, 2007). Briefly, Malt is better
at identifying short distance dependencies while MST is good for long distance dependencies and
root. This pattern, as it turns out, is a direct repercussion of the algorithms used by these parsers.
The greedy deterministic Malt will make better use of local features while graph based MST will
make better use of the global features while training.
In sections 6.2.1 - 6.2.3 we discuss the results of the experiments described in the previous
chapter.
6.2.1 Use of targeted features in data-driven parsing
The four types of features that we discussed earlier in chapter 5 benefits both Malt and MST.
These features provide necessary information that lead to establishing different relations. Figure
6.2 shows the relative increase of these feature over the baseline of MSTParser.
We noted in chapter 5 that root, case and vibhakti get selected to give the best performance.
The selection of these morph features is quite obvious. Use of root helps in reducing sparcity,
whereas case and vibhakti provide important structural cues to identify relations. However,
gender, number and person in both Malt and MST led to decrease in accuracy. Agreement
patterns in ILs are not always straight-forward and such non-local patterns present in the language
are not being selected by the parsers while learning. This has also been noted for other languages,
such as Hebrew (Goldberg and Elhadad, 2009). Some recent parsing models such as Relational-
realizational parsing (Tsarfaty and Sima'an, 2008) have tried to overcome this problem in the
context of data driven parsing.
92
For all the experiment, the conjoined feature of parent and child vibhakti proved very
beneficial. Recall that there is a TAM-vibhakti mapping (more precisely, TAM-sup mapping) for
many dependency relations. The conjoined feature captures this mapping and therefore helps in
improving the overall performance. Incorporating this feature in Malt is pretty straightforward,
but for MST we modified the original code to get this feature working (APPENDIX VI). Table
6.4 shows the confusion matrix for some important labels. The confusion occurs mainly because
of absence of postposition or because of ambiguous postpositions or TAM. Other than the
conjoined features, features capturing the properties of a partially built tree in case of Malt proved
to be immensely useful.
Figure 6.2. Improvement of different features over MST baseline.
Morph: Morphological, L-Morph: Local Morphosyntax, Sem: Minimal Semantics
Note that clausal and semantic features presume the presence of L-Morph features.
93
Label Confusion
k1 k2 nmod pof k1s
270 232 158 104
k2 pof k1 k4 k2s
219 216 125 120
pof k2 k1 k1s
297 197 135
k7 k7p
363
k7p k7
209
main ccof nmod__relc vmod k2
178 150 94 86
r6 r6-k1 r6-k2
156 117
Table 6.4. Most frequent confusions.
We first experimented by giving only the clause inclusion (boundary) information to each node
(Table 6.5). This feature helps the parser reduce its search space during parsing decisions. Then,
we provided only the head and non-head information (whether that node is the head of the clause
or not). The head or non-head information helps in handling complex sentences that have more
than one clause and each verb in the sentence has its own argument structure. Surprisingly, we
did not achieve the best performance by using both boundary and head information. We suspect
that this is not the final word and further optimization might help in this case. The use of clause
boundary as features in MSTParser helps more with getting the attachments right than the label.
Figure 6.3 shows the effect of clausal features with respect to arc length. We can see that as the
arc length increases the effect of this feature becomes more pronounced.
94
MSTParser
UAS LAS
Clause Boundary 89.97 76.55
Clausal Head 89.27 75.35
Clause Boundary +
Clausal Head
89.47 75.92
Table 6.5. Effect of clausal features on parser performance.
Figure 6.3. Effect of clausal feature on arc length in MSTParser.
Minimal semantics as discussed earlier helps in capturing certain semantic selectional restriction
of a verb. We found that it helped in disambiguating mostly those relations which occur with null
postpositions. This is interesting because in such cases all the other features that have been
discussed earlier fail to work. Some such relations are shown in Table 6.6. Also notice that these
relations were also seen earlier in Table 6.4 showing that they are error prone.
95
MSTParser MaltParser
Baseline Min. Sem Baseline Min. Sem
k1 69.2 74.6 77.5 79.9
k2 52.2 59.6 62.9 64.1
pof 54.6 69.2 55.8 61.4
k7t 56.2 67.8 59.2 69.0
Table 6.6. Effect of minimal semantics on some relations.
6.2.2 Chunk Based Parsing
Chunk based parsing either as hard or soft constraints allows to make visible the notion of
vibhakti. For languages such as Hindi and Bangla this becomes crucial as the surface syntactic
cue necessary to identify certain relations are distributed. Note that it is the notion of vibhakti that
makes the use of conjoined features (which gave us big jump in accuracy) effective.
Use of chunk information either as hard or soft constraint has different advantages. Marking
relations only between chunk heads makes the parser ignore local details thereby making many
decisions easy. Some such patterns are
1. (NN)_NP (PSP NN PSP)_NP
2. (DEM NN PSP PRT)_NP
3. (NN RDP)_NP (VM VAUX)_VGF
4. (VM VAUX)_VGF (CC)_CCP … (VM)_VGF
Identification of chunk boundaries for most of these patterns is not difficult. In Table 6.7 we see
that for short distance dependencies using chunks as hard constraints gives better accuracy. As
the arc length increases however, using chunks as soft constraints seems like a better option. We
note that, use of chunk information either as hard or soft constraints helps improve the
performance over the baseline (cf. Table 5.1).
96
Malt MST + MaxEnt
H-C S-C H-C S-C
to_root 69.90 64.07 94.32 73.76
1 99.48 99.41 99.20 98.89
2 95.65 95.32 92.42 94.94
3 to 6 90.40 91.21 89.26 89.49
7 & above 81.72 83.73 84.55 85.47
Figure 6.7. Precision values for UAS with respect to arc length.
H-C: Chunk as hard constraint, S-C: Chunk as soft constraint
6.2.3 Clause Based Parsing
Results in table 5.2 show that the increase in LAS and LA (of 2-Hard-S2 and 2-Soft) is higher
than the their increase in UAS with respect to the baseline. In Table 6.8 we see that 2-Hard-S2
increase over Baseline is consistent across the board for both inter-clause and intra-clause LA and
LAS. These experiments show us that there is a clear pattern in cases where parsing benefits from
2-Hard-S2. These benefits are spread over both 1st stage and 2
nd stage. These cases are:
a) Better identification of some relations with deviant/ambiguous postpositions in the 1st
stage. For example, when ‘se’ appears for beneficiary/cause, instead of its default usage
for instrument. Table 6.10 shows the label identification for some frequent postpositions.
Notice that these postpositions are either 0 or ambiguous.
b) Improved handling of non-finite verbs in the 1st stage
c) Better handling of NULL nodes in the 2nd stage. Most NULL nodes are cases of ellipses
where a syntactic heads such as finite verb or a conjunct is missing. Most of these cases
fall into 2nd stage and are being better handled there.
d) Better handling of some 2nd stage specific constructions, e.g. clausal complements.
e) Improved handling of relative clauses. (cf. Table 6.9)
97
Intra-clausal
UAS
Intra-clausal
LAS
Inter-clausal
UAS
Inter-clausal
LAS
Baseline Malt 89.05 72.18 87.98 85.43
2-Hard-S2 88.87 72.36 90.1 87.71
Table 6.8. Accuracy for intra- and inter-clausal dependency relations.
LAS LA
Precision Recall Precision Recall
Baseline Malt 66.67 21.29 67.86 21.67
2-Hard-S2 38.79 34.22 83.19 73.38
Table 6.9. Accuracy for relative clause construction.
Closely related to the above points is the performance of 2-Hard-S2 and 2-Soft with respect
to arc length, depth and non-projectivity. It is known that Malt suffers on the dependencies closer
to the root (less depth) due to error-propagation. Also, Malt suffers at long distance dependencies
because of local feature optimization (McDonald and Nivre, 2007). In other words, for Malt,
depth and errors are negatively correlated while arc-distance and errors are positively correlated.
Postposition Baseline 2-Hard-S2
0
me
para
GEN
ko
se
Table 6.10. Label identification comparison between Baseline and 2-Hard-S2 for ambiguous
postpositions. signifies better performance. 0 and GEN signify null postposition and genitive
postpositions respectively.
Figure 6.4 shows the LAS of relations at various arc-lengths for 2-Hard-S2, 2-Soft and
Baseline. Note that as the arc-length increases the advantage of 2-Hard-S2 becomes more
pronounced. Figure 6.5 shows the performance of relations at different depths. By distinguishing
intra-clausal structures from inter-clausal structures, the 2-Hard-S2 setup is using shallower trees.
This effect is clearly seen in Figure 6.5, where for less depth 2-Hard-S2 outperforms Baseline.
98
Cases (c), (d) and (e) above reflect this. Cases (a) and (b) on the other hand show that 2-Hard-S2
also affects 1st stage performance by learning verbal arguments (both complements and adjuncts)
better. It is known that MaltParser has a rich feature representation but with increase in sentence
length its performance gets affected due error propagation. By treating a clause as a parsing unit
we reduce this error propagation as the features are being exploited properly.
It was found that 2-Hard-S2 did not help in reducing the non-projective relations. As both
Baseline and 2-Hard-S2 along with 2-Soft use the Arc-Eager parsing algorithm, they fare equally
badly in handling non-projectivity. There were some sentences where non-projectivity got
removed in the 1st stage, however the non-projective arc reappeared in the 2
nd stage, this
happened in the case of paired connective constructions (cf., Mannem et al., 2009a). We are yet
to investigate if non-projective parsing in the 2nd
stage might prove beneficial in such cases.
Figure 6.4. LAS at arc-length for Baseline, 2-Soft and 2-Hard-S2. The numbers above the bars
represent the % of relations at respective arc lengths.
99
Figure 6.5. LAS at depth for Baseline, 2-Soft and 2-Hard-S2. The numbers above the bars
represent the % of relations at respective depths.
In this section we pointed out some of the consistent trends that were noticed in our
experiments. Table 6.11 below shows some of the main advantages of the different data-driven
methods discussed in this section.
Method Benefits
Morphological features a) Captures the inflectional cues necessary for identifying
different relations
Clausal Features a) Helps in identifying the root of the tree
b) Helps in better handling of long distance relations
Minimal Semantics a) Captures semantic selectional restriction (needed precisely
when surface cues fail)
Chunk Parsing and
Local morphosyntax
a) Captures the notion of vibhkati
b) Helps capturing the postposition-TAM mapping
c) Helps in reducing attachment mistakes
Clausal Parsing a) Better learning of intra-clausal deviant relations
b) Better handling of participles
c) Better handling of long distance relations
Table 6.11. Advantages of different features/methods
100
6.2.4 Errors
Some of the main error sources during data-driven parsing were:
(a) Argument structure in simple sentences,
(b) Embedded clause construction (participles and relative clause),
(c) Coordination,
(d) Complex predicates.
6.2.5 Causes of Errors
The main causes of the errors discussed in the previous section are:
(a) Errors due to improper learning:
i. Label bias: This happens when a same feature is associated with two independent
classes (labels). This will lead to the parser identifying the label that is more frequent.
In some such cases, use of minimal semantic information helped the parser make the
correct decision. Clausal parsing helps in reducing some such errors by better learning
(cf. Table 6.10).
ii. Distributed features: One important instance of this would be agreement pattern. It was
noted in section 6.2.1 that out of all the morph features only root, case, and suffix
proves to be helpful. Other features such as gender, number and person that are useful
during agreement did not get selected during feature selection.
iii. Difficulty in making linguistic generalizations: That a verb will have a single mandatory
argument like k1 or k2 is not learnt properly. There are instances where a single verb
has more than one k1.
(b) Long distance dependencies: Like non-projectivity long distance dependencies can prove to
be a major source of errors. Clausal features and clausal parsing helped us reduce some such
errors.
(c) Nonprojectivity: Around 10% of all the arcs in the Hindi treebank are non-projective
(Mannem et al., 2009a). Most of these are inter-clausal and the use of clausal features was
helpful in indentifying some. But non-projectivity still remains a source of error.
(d) Genuine ambiguities: Correct decision in the case of certain sentences is difficult as there are
no precise cues that can be exploited. Some such instances are:
i. Lack of post-positions,
101
ii. Ambiguous post-positions,
iii. Adjectival participles attachment,
iv. Arguments of participles,
v. Appositions.
(e) Small corpus size: One potential reason for the low performance can be the small training
data. But, Hall et al. (2007) have shown that for many languages small data size is not a
crucial factor in ascertaining there parsing performance. The training data of 3000 sentences
is still small, and it is likely that many problems will reduce once the data increases.
6.3 Linguistically rich graph based parsing
Table 6.12 shows that the improvement in the accuracies is spread across different kinds of
relations. Both inter-clausal as well as intra-clausal relations benefit. Within a clause, both
argument structure of a verb and other relations are better identified in Experiment 8 (described
in 5.3.3) when compared to the baseline. There was a consistent improvement in the analysis of
certain phenomenons. These were:
a) Intra-clausal coordinating conjunction as dependents. These may appear either as arguments
of the verb or as children of non-verbal heads.
b) Better handling of arguments of non-finite verbs
c) Better handling of clausal arguments and relative clauses.
Relations Baseline Experiment 8
UAS LAS UAS LAS
Intra-clausal 87.86 73.63 88.30 74.00
Verbal Args*
(Intra-clausal)
89.46 72.28 90.02 72.68
Non-args
(Intra-clausal)
86.25 75.00 86.57 75.32
Inter-clausal 91.85 91.53 93.29 92.33
Table 6.12. Comparison of baseline and Experiment 8 accuracies. (*args: arguments)
A similar pattern is seen from Table 6.13 where there is an increase for almost all the POS tags
in the head attachment accuracy, except for adjectival attachments.
102
As mentioned earlier, the constraint graph is originally formed using the linguistic knowledge
of the constraint based system. It is clear that for our experiments the coverage of this knowledge
is very crucial. Our experiments show that while the coverage of verbal and conjunction heads is
good, knowledge of other heads such as predicative nouns and adjectives is lacking. As
mentioned earlier the constraint parser currently uses close to 536 demand frames. It would be
interesting to see how grammar extraction methods for Hindi (Kolachina et al., 2010) can be
combined with our approach to boost the knowledge base being currently used.
POS tag %Instance Accuracy
Baseline Experiment 8
Noun 64 89 90
Finite verb 15 94 95
Non-finite verb 3 83 83
Gerund 5 86 88
Conjunction 7 75 77
Adjective 4 98 95
Table 6.13. Accuracy distribution over POS
6.4 General observation
The performance for MaltParser with clausal modularity outperforms the parser that uses only the
local morphosyntactic features. The performance of linguistically rich MSTParser is better than
the one with clausal feature. In UAS, MSTParser with clausal feature is the highest. We note that
all these variants of MST and Malt outperform their baseline counterparts.
We find that both MSTParser and MaltParser outperform GH-CBP17
. It is clear that there is
still a lot of scope of improvement in GH-CBP. Its oracle score (GH-CBP’’) for LAS is
considerably higher than both MST and Malt and better ranking methods can reduce the gap
between them. These results show that in spite of the positive effects of important features,
linguistic modularity, etc. the overall performance for all the parsers is still low. This is
particularly true for LAS. The error analysis discussed in the previous sections clearly shows that
17
We note that it is not possible to directly compare the performance of GH-CBP with Malt and MST
currently because of the difference in the granularities of the dependency tagset. GH-CBP was tested on a
broad-grained tagset, where as the data-driven parsers were tested on a fine-grained tagset. Nevertheless,
the relative comparison still holds ground.
103
(a) ambiguous post-positions, and
(b) lack of post-positions
are amongst the main reasons for this. Table 6.4 shows this clearly, where labels such as k1, k2,
pof, k7t, etc. take the greatest hit in accuracy. In the previous chapter we mentioned that use of
minimal semantics can help cases such as (a) and (b). Table 6.6 illustrates that this is in fact the
case. It shows that these high confusion relations can benefit significantly from use of minimal
semantics. However, automatic identification of such semantic tags is not a trivial task and recent
attempts to do so for Hindi haven’t been very promising (Kosaraju et al., 2010). Knowledge of
verb class or the verb frame will also be something that can affect the overall accuracy
dramatically. Its effect on (linguistically rich) MSTParser is quite apparent. Constraint graph that
were used as input to MSTParser are formed, as we know, through such knowledge18
. Trying to
use such automatically induced grammatical knowledge in the future should be a productive
enterprise. There have been some recent attempts at, for example, automatic identification of
complex predicates and its positive effect on parsing accuracy (Begum et al., 2011). Such
knowledge is not only applicable to verbal heads but other non-verbal heads as well. We have
seen previously that GH-CBP currently has very little information about predicative nominal and
adjectival heads, so use of such frames should help.
Closely related to the task of grammar induction is automatic identification of the verb’s class
or frame. Currently GH-CBP uses all the available frames of a verb during derivation. If one can
correctly select the correct frame before derivation, we can consistently reduce the total number
of output parses. This will indirectly benefit prioritization. This task of frame/class selection can
be simplified if one ascertains only the valency of a head. In this formulation, this task becomes
similar to the task of supertagging (Bangalore and Joshi, 1999). Such information should also
help data driven parsing.
Overall the UAS for all the parsers is high. This shows that most of the languages structures
are been identified successfully. For data-driven parsers non-projective structures still remains a
challenge. Other major error sources are ambiguous constructions mentioned in section 6.2.5.
Perhaps increase in data will help resolve such ambiguities. The data-driven parsers currently do
not learn many linguistic generalizations such as agreement, single subject constraint, etc. Such
18
Note that constraint graph along with capturing the knowledge of demand frames also incorporates
clausal scope.
104
generalizations are frequent in some complex patterns that exists between verbal heads and their
children. Some recent work have been able to incorporate this successfully (Ambati, 2010).
105
Chapter 7
7. Conclusion
In this work we successfully incorporated various grammatical notion from Computation Paninan
Grammar (CPG) to build a generalized parsing framework. This was first done by using a
constraint based paradigm where elements from CPG lead to a layered parsing architecture. The
proposed generalized hybrid constraint based parsing system (GH-CBP) uses different types of
constraints to account for language variability and at the same time maintains its generic nature.
In particular, we described how grammatical notions such as, control, passives, gapping, verbal
alternations, agreement, subordination, coordination, etc. can be handled in GH-CBP for Indian
languages such as Hindi, Telugu and Bangla. We then integrated this setup with a graph-based
parsing/labeling inspired ranking strategy that prioritized the parses of the core constraint parser.
GH-CBP was evaluated for Hindi and Telugu and the results show good coverage of the parser.
In Chapter 5 we described ways in which the insights from GH-CBP can be used in data-
driven paring. This was successfully done using (a) linguistically motivated features, (b) using
linguistically constrained modularity, and (c) linguistically rich graph based parsing. These
experiments point out to various crucial factors that help in improving the parsing accuracy. In
particular, they show the importance of four types of linguistic features, namely, morphology
based, local morphosyntax, clausal and semantic based. The second set of experiments showed
that use of clausal and chunk information to modularize the parsing process helps. Using
linguistically rich graph based parsing we successfully used the knowledge of the constraint
parser in data-driven parsing. All these experiments led to statistically significant improvement
over the baseline systems. We finally discussed in detail the results of both constraint based and
data-driven parsing experiments and made explicit certain trends that flesh out the positives and
negatives in our work.
106
APPENDIX I: Dependency Tagset
No. Tag Name Tag description
1
k1
karta (doer/agent/subject)
Karta is defined as the 'most independent' of all the karakas
(participants).
2
k1s
vidheya karta (karta samanadhikarana)
Noun complements of karta
3
k2
karma (object/patient)
Karma is the locus of the result implied by the verb root.
4
k2p
Goal, Destination
The destination or goal is also taken as a karma. k2p is a subtype
of karma (k2). The goal or destination where the action of motion
ends is a k2p.
5
k2s
karma samanadhikarana (object complement)
The object complement is called as karma samanadhikarana.
6
k3
karana (instrument)
karana karaka denotes the instrument of an action expressed by a
verb root. The activity of karana helps in achieving the activity
of the main action.
7
k4
sampradaana (recipient)
Sampradana karaka is the recipient/beneficiary of an action. It is
the person/object for whom the karma is intended.
anubhava karta (Experiencer)
107
8 k4a
The experiencer/perceiver in perception verbs such as seems,
appear, etc..
9
k5
apaadaana (source)
apadana karaka indicates the source of the activity, i.e. the point
of departure. A noun denoting the point of separation for a verb
expressing an activity which involves movement away from is
apadana.
10
k7
vishayaadhikarana (location)
Location in time, place or abstract time.
11
r6
shashthi (possessive)
The genitive/possessive relation which holds between two nouns.
12
r6-k1,
r6-k2
karta or karma of a conjunct verb (complex predicate)
13
rh
hetu (cause-effect)
The reason or cause of an activity.
14
rt
taadarthya (purpose)
The purpose of an action.
15
nmod__relc,
jjmod__relc,
rbmod__relc
Relative clauses, jo-vo constructions
16
nmod
Noun modifier (including participles)
An underspecified relation employed to show general noun
modification without going into a finer type.
Verb modifier
108
17
vmod
Another underspecified tag. For some relations getting into finer
subtypes is not yet possible. Such relations are annotated with
slightly underspecified tag.
18 jjmod Modifiers of the adjectives
19
pof
Part of relation
Part of units such as conjunct verbs.
21
ccof
Conjunct of relation
Co-ordination and sub-ordination.
The above coarse-grained tagset was used in the ICON10 tools contest on IL depedency parsing
(Husain et al., 2010). For the complete tagset, see:
http://ltrc.iiit.ac.in/MachineTrans/research/tb/DS-guidelines/DS-guidelines-ver2-28-05-09.pdf
109
APPENDIX II: Chunk Tagset
No. Chunk Type Tag Name
1 Noun Chunk NP
2 Finite Verb Chunk VGF
3 Non-finite Verb Chunk VGNF
4 Infinitival Verb Chunk VGINF
5 Verb Chunk (Gerund) VGNN
6 Adjectival Chunk JJP
7 Adverb Chunk RBP
8 Chunk for Negatives NEGP
9 Conjuncts CCP
10 Chunk Fragments FRAGP
` Miscellaneous BLK
For complete description, see the guidelines:
http://ltrc.iiit.ac.in/MachineTrans/publications/technicalReports/tr031/posguidelines.pdf
110
APPENDIX III: POS Tagset
No. Category Tag Name
1 Noun NN
2 NLoc NST
3 Proper Noun NNP
4 Pronoun PRP
5 Demonstrative DEM
6 Verb-finite VM
7 Verb Aux VAUX
8 Adjective JJ
9 Adverb RB
10 Post position PSP
11 Particles RP
12 Conjuncts CC
13 Question Words WQ
14 Quantifiers QF
15 Cardinal QC
16 Ordinal QO
17 Classifier CL
18 Intensifier INTF
19 Interjection INJ
20 Negation NEG
21 Quotative UT
22 Sym SYM
23 Compounds *C
24 Reduplicative RDP
25 Echo ECH
26 Unknown UNK
It was decided that for foreign/unknown words that the POS tagger may give a tag “UNK”
For complete description, see the guidelines:
http://ltrc.iiit.ac.in/MachineTrans/publications/technicalReports/tr031/posguidelines.pdf
111
APPENDIX IV: MaltParser Features
<featuremodels>
<featuremodel name="nivreeager">
<feature>InputColumn(FORM, Stack[0])</feature>
<feature>InputColumn(FORM, Input[0])</feature>
<feature>InputColumn(POSTAG, Stack[0])</feature>
<feature>InputColumn(POSTAG, Input[0])</feature>
<feature>InputColumn(POSTAG, Input[1])</feature>
<feature>InputColumn(POSTAG, Input[2])</feature>
<feature>InputColumn(POSTAG, Input[3])</feature>
<feature>InputColumn(POSTAG, Stack[1])</feature>
<feature>InputColumn(POSTAG, pred(Stack[0]))</feature>
<feature>InputColumn(POSTAG, head(Stack[0]))</feature>
<feature>InputColumn(POSTAG, ldep(Input[0]))</feature>
<feature>InputColumn(CPOSTAG, Stack[0])</feature>
<feature>InputColumn(CPOSTAG, Input[0])</feature>
<feature>InputColumn(CPOSTAG, Input[1])</feature>
<feature>InputColumn(CPOSTAG, ldep(Input[0]))</feature>
<feature>InputColumn(FORM, ldep(Input[0]))</feature>
<feature>InputColumn(FORM, Input[1])</feature>
<feature>InputColumn(LEMMA, Stack[0])</feature>
<feature>InputColumn(LEMMA, Input[0])</feature>
<feature>InputColumn(LEMMA, Input[1])</feature>
<feature>OutputColumn(DEPREL, rdep(Stack[0]))</feature>
<feature>OutputColumn(DEPREL, lsib(rdep(Stack[0])))</feature>
<feature>Split(InputColumn(FEATS, Stack[0]),\|)</feature>
<feature>Split(InputColumn(FEATS, Input[0]),\|)</feature>
<feature>Merge(InputColumn(POSTAG, Stack[0]), InputColumn(POSTAG,
Input[0]))</feature>
<feature>Merge(InputColumn(FEATS, Stack[0]),OutputColumn(DEPREL,
Stack[0]) )</feature>
112
</featuremodel>
</featuremodels>
113
APPENDIX V: MaltParser Features (2nd
stage of 2-Hard-S2)
<?xml version="1.0" encoding="UTF-8"?>
<featuremodels>
<featuremodel name="nivreeager">
<feature>InputColumn(FORM, Stack[0])</feature>
<feature>InputColumn(FORM, Input[0])</feature>
<feature>InputColumn(POSTAG, Stack[0])</feature>
<feature>InputColumn(POSTAG, Input[0])</feature>
<feature>InputColumn(POSTAG, Input[1])</feature>
<feature>InputColumn(POSTAG, Input[3])</feature>
<feature>InputColumn(POSTAG, Stack[1])</feature>
<feature>InputColumn(POSTAG, pred(Stack[0]))</feature>
<feature>InputColumn(CPOSTAG, Stack[0])</feature>
<feature>InputColumn(FORM, Input[1])</feature>
<feature>InputColumn(LEMMA, Stack[0])</feature>
<feature>InputColumn(LEMMA, Input[0])</feature>
<feature>InputColumn(LEMMA, Input[1])</feature>
<feature>OutputColumn(DEPREL, Stack[0])</feature>
<feature>OutputColumn(DEPREL, lsib(rdep(Stack[0])))</feature>
<feature>Merge(InputColumn(POSTAG, Stack[0]), InputColumn(FORM,
Stack[0]))</feature>
<feature>OutputColumn(DEPREL, ldep(Input[1]))</feature>
</featuremodel>
</featuremodels>
114
APPENDIX VI: MSTParser Features19
Basic Unigram Features
p-word, p-pos
p-word
p-pos
c-word, c-pos
c-word
c-pos
Basic Bigram Features
p-word, p-pos,c-word, c-pos
p-word, c-word, c-pos
p-pos, c-word, c-pos
p-word, p-pos, c-pos
p-word, p-pos, c-word
p-pos, c-pos
Basic Unigram Features + label
Basic Unigram Features + FEATS
Basic Unigram Features + p-FEATS
Basic Unigram Features + c-FEATS
Conjoined (Incorporated after modifying MSTParser)
Basic Unigram Features + FEATS + label
Basic Unigram Features + p-FEATS + label
Basic Unigram Features + c-FEATS + label
19 p-*: parent features
c-*: child features
FEATS: features in the FEATS column in the CoNLL format.
115
APPENDIX VII: MaxEnt Labeler Features
Dependency Tree Nodes
Current node
Parent node
Right-most left sibling
Left-most right sibling
Children
Features
Lexical item
Root form of the word
Part-of-speech tag
Coarse POS tag
Vibhakti markers
Direction of the dependency arc
Number of siblings
Number of children
Difference in positions of node and its parent
POS list from dependent to tree’s root through the dependency path
116
Bibliography
S. Abney. 1996. Part-of-Speech Tagging and Partial Parsing. In K. Church, S. Young, and G.
Bloothooft, editors, Corpus-Based Methods in Language and Speech. Kluwer Academic
Publishers.
S. Abney. Partial Parsing via Finite-State Cascades. 1997. Natural Language Engineering,
2(4):337–344.
J. Aissen. 1999. Markedness and subject choice in Optimality Theory. Natural Language and
Linguistic Theory 17:673–711.
B. R. Ambati. 2010. Importance of linguistic constraints in statistical dependency parsing. In
Proceedings of ACL 2010 Student Research Workshop (SRW), Uppsala, Sweden.
B. Ambati, S. Husain, J. Nivre and R. Sangal. 2010a. On the Role of Morphosyntactic Features in
Hindi Dependency Parsing. In Proceedings of NAACL-HLT 2010 workshop on Statistical
Parsing of Morphologically Rich Languages (SPMRL 2010), Los Angeles, CA.
B. Ambati, S. Husain, S. Jain, D. M. Sharma and R. Sangal. 2010b. Two methods to incorporate
'local morphosyntactic' features in Hindi dependency parsing. In Proceedings of NAACL-HLT
2010 workshop on Statistical Parsing of Morphologically Rich Languages (SPMRL 2010) Los
Angeles, CA.
B. Ambati, P. Gade, G.S.K. Chaitanya and S. Husain. 2009. Effect of Minimal Semantics on
Dependency Parsing. In RANLP09 student paper workshop.
G. Attardi and F. Dell’Orletta. 2008. Chunking and Dependency Parsing. LREC Workshop on
Partial Parsing: Between Chunking and Deep Parsing. Marrakech, Morocco.
S. Bangalore and A. K. Joshi. 1999. Supertagging: an approach to almost parsing. Computational
Linguistics.
117
R. Begum, K. Jindal, A. Jain, S. Husain and D. M. Sharma. 2011. Identification of Conjunct
Verbs in Hindi and Its Effect on Parsing Accuracy In Proceedings of the 12th CICLing, Tokyo,
Japan.
R. Begum, S. Husain, A. Dhwaj, D. M. Sharma, L. Bai and R. Sangal. 2008a. Dependency
Annotation Scheme for Indian Languages. In Proceedings of The Third International Joint
Conference on Natural Language Processing (IJCNLP). Hyderabad, India.
R. Begum, S. Husain, D. M. Sharma and L. Bai. 2008b. Developing Verb Frames in Hindi. In
Proceedings of The Sixth International Conference on Language Resources and Evaluation
(LREC). Marrakech, Morocco.
A. Bharati, S. Husain, D. M. Sharma and R. Sangal. 2009a. Two stage constraint based hybrid
approach to free word order language dependency parsing. In Proceedings of the 11th
International Conference on Parsing Technologies (IWPT). Paris.
A. Bharati, S. Husain, M. Vijay, K. Deepak, D. M. Sharma and R. Sangal. 2009b. Constraint
Based Hybrid Approach to Parsing Indian Languages. In Proceedings of the 23rd Pacific Asia
Conference on Language, Information and Computation (PACLIC 23). Hong Kong
A. Bharati, M. Gupta, V. Yadav, K. Gali, D.M. Sharma. 2009c. Simple Parser for Indian
Languages in a Dependency Framework. In Proc. of the Third Linguistic Annotation Workshop
at 47th ACL and 4th IJCNLP.
A. Bharati, D. M. Sharma, S. Husain, L. Bai, R. Begam and R. Sangal. 2009d. AnnCorra:
TreeBanks for Indian Languages, Guidelines for Annotating Hindi TreeBank.
http://ltrc.iiit.ac.in/MachineTrans/research/tb/DS-guidelines/DS-guidelines-ver2-28-05-09.pdf
A. Bharati, S. Husain, B. Ambati, S. Jain, D. M. Sharma and R. Sangal. 2008a. Two semantic
features make all the difference in Parsing accuracy. In Proceedings of the 6th International
Conference on Natural Language Processing (ICON-08), CDAC Pune, India.
A. Bharati, S. Husain, D. M. Sharma, and R. Sangal. 2008b. A Two-Stage Constraint Based
Dependency Parser for Free Word Order Languages. In Proceedings of the COLIPS
118
International Conference on Asian Language Processing 2008 (IALP). Chiang Mai, Thailand.
A. Bharati, D. M. Sharma, L. Bai and R. Sangal. 2006. AnnCorra: Annotating Corpora
Guidelines for POS and Chunk Annotation for Indian Languages. LTRC-TR31.
A. Bharati, R. Sangal and T. P. Reddy. 2002. A Constraint Based Parser Using Integer
Programming, In Proc. of ICON-2002.
A. Bharati, V. Chaitanya, R. Sangal. 1995a. Natural Language Processing: A Paninian
Perspective. Prentice-Hall of India, New Delhi.
A. Bharati, A. Gupta and R. Sangal. 1995b. Parsing with Nesting Constraints. Proc of 3rd NLP
Pacific Rim Symposium, Seoul, S. Korea.
A. Bharati and R. Sangal. 1993. Parsing Free Word Order Languages in the Paninian Framework.
In Proc. of ACL:93.
E. Black, F. Jelinek, J. D. Lafferty, D.M.Magerman, R. L.Mercer, and S. Roukos. 1992. Towards
history-based grammars: Using richer models for probabilistic parsing. In Proc. of the 5th
DARPA Speech and Natural Language Workshop, pages 31–37.
M. Butt. 1995. The Structure of Complex Predicates in Urdu. CSLI Publications.
J. Carroll. 2000. Statistical parsing. In R. Dale, H. Moisl, and H. Somers, (eds), Handbook of
Natural Language Processing, Marcel Dekker, pp. 525–543.
Y.J. Chu and T.H. Liu. 1965. On the shortest arborescence of a directed graph. Science Sinica,
14:1396–1400.
M. Collins. 2000. Discriminative reranking for natural language parsing. In Proc. of 7th ICML.
M. Collins and T. Koo. 2005. Discriminative reranking for natural language parsing. In CL p.25
70 March05.
119
B. Comrie. 1989. Language Universals and Linguistics Typology: Syntax and morphology.
Univerty of Chicago Press.
R. Debusmann, D. Duchier and G. Kruijff. 2004. Extensible dependency grammar: A new
methodology. Proceedings of the Workshop on Recent Advances in Dependency Grammar, pp.
78–85.
D. Duchier. 1999. Axiomatizing dependency parsing using set constraints. Proceedings of the 6th
Meeting on Mathematics of Language, Orlando, FL, pp. 115-126.
D. Duchier and R. Debusmann. 2001. Topological dependency trees: A constraint-based account
of linear precedence. Proc of 39th ACL and 10
th EACL. Toulouse, France, pp. 180-187.
J. Edmonds. 1967. Optimum branchings. Journal of Research of the National Bureau of
Standards, 71B:233–240.
J. Eisner. 1996. Three new probabilistic models for dependency parsing: An exploration,
Proceedings of the 16th COLING, Copenhagen, Denmark, pp. 340-345.
M.B. Emeneau. 1956. India as a linguistic area. Linguistics 32, 3–16
G. Eryigit, J. Nivre and K. Oflazer. 2008. Dependency Parsing of Turkish. Computational
Linguistics 34(3), 357-389.
K. A. Foth and W. Menzel. 2006. Hybrid parsing: Using probabilistic models as predictors for a
symbolic parser. In Proc. of COLING-ACL06.
P. Gadde, K. Jindal, S. Husain, D. M Sharma, and R. Sangal. 2010. Improving Data Driven
Dependency Parsing using Clausal Information. In Proceedings of NAACL-HLT 2010, Los
Angeles, CA. 2010.
Y. Goldberg and M. Elhadad. 2009. Hebrew Dependency Parsing: Initial Results. In Proceedings
of the 11th IWPT09. Paris. 2009.
120
J. Gorla, A. K. Singh, R. Sangal, K. Gali, S. Husain and S. Venkatapathy. 2008. A Graph Based
Method for Building Multilingual Weakly Supervised Dependency Parsers. In Proceedings of
the 6th International Conference on Natural Language Processing (GoTAL). Gothenburg,
Sweden. 2008.
M. Gupta, V. Yadav, S. Husain and D. M. Sharma. 2008. A Rule Based Approach for Automatic
Annotation of a Hindi TreeBank. In Proceedings of the 6th International Conference on
Natural Language Processing (ICON-08), CDAC Pune, India.
M. Gupta, V. Yadav, S. Husain and D. M. Sharma. 2010. Partial Parsing as a Method to Expedite
Dependency Annotation of a Hindi Treebank. In Proceedings of The 7th International
Conference on Language Resources and Evaluation (LREC). Valleta. Malta
J. Hajič, A. Böhmová, E. Hajičová and B. V. Hladká. 2000. The Prague Dependency Treebank: A
Three-Level Annotation Scenario. In A. Abeillé (ed.) Treebanks: Building and Using Parsed
Corpora, Amsterdam. Kluwer, 2000, pp. 103-127
E. Hajičová. 2002. Theoretical description of language as a basis of corpus annotation: The case
of Prague Dependency Treebank". In E. Hajičová, P. Sgall, J. Hana, T. Hoskovec (eds.):
Prague Linguistic Circle Papers, (4), Amsterdam/Philadelphia:John Benjamins, 2002, pp. 111
127
E. Hajicova. 1998. Prague Dependency Treebank: From Analytic to Tectogrammatical
Annotation. In Proc. TSD’98.
J. Hall, J. Nilsson, J. Nivre, G. Eryigit, B. Megyesi, M. Nilsson and M. Saers. 2007. Single Malt
or Blended? A Study in Multilingual Parser Optimization. In Proceedings of the CoNLL
Shared Task Session of EMNLP-CoNLL 2007, 933—939
M. P. Harper and R. A. Helzermann. 1995. Extensions to constraint dependency parsing for
spoken language processing. Computer Speech and Language 9: 187–234.
M. P. Harper, R. A. Helzermann, C. B. Zoltowski, B. Yeo, Y. Chan, T. Steward and P. L. Pellom.
1995. Implementation issues in the development of the PARSEC parser, Software: Practice
121
and Experience 25: 831-862.
Z. Harris. 1962. String analysis of sentence structure. Mouton.
P. Hellwig. 1986. Dependency Unification Grammar. Proc. of 11th COLING, Bonn Germany, pp.
195-198.
P. Hellwig. 2003. Dependency Unification Grammar. In V. Agel, L.M. Eichinger, H. W. Eroms,
P. Hellwig, H. J. Heringer and H. Lobin (eds), Dependency and Valency, Walter de Gruyter,
pp. 593-635.
R. Hudson. 1984. Word Grammar, Basil Blackwell, 108 Cowley Rd, Oxford, OX4 1JF, England.
R. Hudson. 1990. English Word Grammar, Basil Blackwell, 108 Cowley Rd, Oxford, England.
R. Hudson. 2007. Language Networks: The New Word Grammar. Oxford University Press.
S. Husain, P. Mannem, B. R. Ambati, and P. Gadde. 2010. The ICON-2010 Tools Contest on
Indian Language Dependency Parsing. In Proceedings of ICON-2010 Tools Contest on Indian
Language Dependency Parsing. Kharagpur, India.
S. Husain, P. Gadde, B. Ambati, D. M. Sharma and R. Sangal. 2009. A modular cascaded
approach to complete parsing. In Proceedings of the COLIPS International Conference on
Asian Language Processing 2009 (IALP). Singapore.
S. Husain. 2009. Dependency Parsers for Indian Languages. In Proceedings of ICON09 NLP
Tools Contest: Indian Language Dependency Parsing. Hyderabad, India. 2009.
T. Järvinen and P. Tapanainen. 1998. Towards and implementable dependency grammar.
Proceedings of the Workshop on Processing of Dependency-Based Grammars (ACL
COLING), Montreal, Canada, pp. 1-10.
A. Joshi and P. Hopely. 1999. A parser from antiquity: An early application of finite state
transducers to natural language parsing. In Kornai 1999.
122
F. Karlsson, 1990. Constraint grammar as a framework for parsing running text. Papers
Presented to the 13th International Conference on Conputational Linguisticss (COLING),
Helsinki, Finland, pp. 168-173.
F. Karlsson, A. Voutilainen, J. Heikkilä and A. Anttila, (eds). 1995. Constraint Grammar: A
language-independent system for parsing unrestricted text. Mouton de Gruyter.
P. Kiparsky and J. F. Staal. 1969. ‘Syntactic and Relations in Panini’, Foundations of Language
5, 84-117.
P. Kolachina, S. Kolachina, A. K. Singh, V. Naidu, S. Husain, R. Sangal and A. Bharati. 2010a.
Grammar Extraction from Treebanks for Hindi and Telugu. In Proceedings of The 7th
International Conference on Language Resources and Evaluation (LREC). Valleta. Malta.
2010.
T. Koo and M. Collins. 2010. Efficient Third-order Dependency Parsers. In Proc of ACL2010.
P. Kosaraju, S. R. Kesidi, V. B. R. Ainavolu and P. Kukkadapu. 2010. Experiments on Indian
Language Dependency Parsing. In Proc of ICON-2010 tools contest on Indian language
dependency parsing. Kharagpur, India.
BH. Krishnamurthi (ed). 1986. South Asian Languages: Structure, Convergence and Diglossia.
Motilal Banarasidass.
G. M. Kruijff. 2001. A Categorial Modal Architecture of Informativity: Dependency Grammar
Logic & Information Structure. Ph.D. thesis, Charles University, Prague, Czech Republic.
T. Kudo and Y. Matsumoto. 2002. Japanese dependency analysis using cascaded
chunking. In CoNLL-2002. pp. 63–69.
S. Kubler, R. McDonald and J. Nivre. 2009. Dependency parsing. Morgan and Claypool.
P. Mannem, H. Chaudhry and A. Bharati. 2009a. Insights into Non-projectivity in Hindi. In ACL
IJCNLP09 student paper workshop.
123
P. Mannem, A. Abhilash and A. Bharati. 2009b. LTAGspinal Treebank and Parser for Hindi.
Proceedings of International Conference on NLP, Hyderabad. 2009.
M. Marcus, B. Santorini, and M.A. Marcinkiewicz. 1993. Building a large annotated corpus of
English: The Penn Treebank, Computational Linguistics 1993.
A. Martins, N. Smith and E. Xing. 2009. Concise Integer Linear Programming Formulations for
Dependency Parsing. Proceedings of the ACL-IJCNLP09.
H. Maruyama. 1990. Structural disambiguation with constraint propagation. In Proceedings of
ACL:90.
C. P. Masica. 1993. The Indo-Aryan Languages. Cambridge University Press.
R. McDonald and J. Nivre. 2007. Characterizing the Errors of Data-Driven Dependency Parsing
Models. In Proc of Joint Conference on Empirical Methods in Natural Language Processing
and Computational Natural Language Learning
R. McDonald, K. Crammer, and F. Pereira. 2005a. Online large-margin training of dependency
parsers. In Proceedings of ACL 2005. pp. 91–98.
R. McDonald, F. Pereira, K. Ribarov, and J. Hajic. 2005b. Non-projective dependency parsing
using spanning tree algorithms. Proceedings of HLT/EMNLP, pp. 523–530.
I. A. Mel'čuk. Dependency Syntax: Theory and Practice, State University, Press of New York.
1988.
W. Menzel and I. Schröder. 1998. Decision Procedures for Dependency Parsing Using Graded
Constraints. In Proc of ACL. 1998.
J. Nilsson and J. Nivre. 2008. Malteval: An evaluation and visualization tool for dependency
parsing. In the Proc of Sixth International Language Resources and Evaluation, Marrakech,
Morocco.
124
J. Nivre. 2009. Non-Projective Dependency Parsing in Expected Linear Time. In Proc. of ACL
IJCNLP.
J. Nivre and R. McDonald. 2008. Integrating graph-based and transition-based dependency
parsers. In Proc. of ACL-HLT.
J. Nivre, J. Hall, J. Nilsson, A. Chanev, G. Eryigit, S. Kübler, S. Marinov and E Marsi. 2007a.
MaltParser: A language-independent system for data-driven dependency parsing. Natural
Language Engineering, 13(2), 95-135.
J. Nivre and J. Hall and S. Kubler and R. McDonald and J. Nilsson and S. Riedel and D. Yuret.
2007b. The CoNLL 2007 Shared Task on Dependency Parsing. In Proceedings of the CoNLL
Shared Task Session of EMNLP-CoNLL 2007.
J. Nivre. 2006. Inductive Dependency Parsing. Springer.
J. Nivre and J. Nilsson. 2005a. Pseudo-projective dependency parsing. In Proc. of ACL-2005,
pages 99–106.
J. Nivre, 2005b. Dependency Grammar and Dependency Parsing. MSI report 05133. Växjö
University: School of Mathematics and Systems Engineering.
J. Nivre. 2003. An efficient algorithm for projective dependency parsing. In the Proc of 8th
International Workshop on Parsing Technologies.
A. Prince, P. Smolensky. 1993. Optimality Theory: constraint interaction in generative grammar.
In Technical Report, Rutgers Center for Cognitive Science.
O. Rambow, B. Dorr, I. Kucerova and M. Palmer. 2003. Automatically Deriving
Tectogrammatical Labels from other resources- A comparison of Semantic labels across
frameworks. The Prague Bulletin of Mathematical Linguistics 79-80, 23–35 (2003)
S. Riedel and J. Clarke. 2006. Incremental integer linear programming for non-projective
dependency parsing. In Proc. EMNLP.
125
I. Schröder. 2002. Natural Language Parsing with Graded Constraints. PhD thesis, Hamburg
University
D. Seddah, M. Candito and B. Crabbé. 2009. Cross parser evaluation : a French Treebanks study.
In Proceedings of the 11th IWPT09. Paris. 2009.
P. Sgall, E. Hajicova, J. Panevova. 1986. The meaning of the Sentence in Its Pragmatic Aspects.
Reidel
C. Shastri. 1973. Vyakarana Chandrodya (Vol. 1to 5). Delhi: Motilal Banarsidass. (In Hindi)
L. Shen, A. Sarkar, A. K. Joshi. 2003. Using LTAG Based Features in Parse Reranking. In
Proc. of EMNLP 2003.
S. M. Shieber. Evidence against the context-freeness of natural language. 1985. In Linguistics and
Philosophy, p. 8, 334–343.
P. Shiuan and C. Ting Hian Ann. 1996.. A Divide-and-Conquer Strategy for Parsing. In Proc.
of IWPT.
S. Starosta. 1988. The Case for Lexicase: An Outline of Lexicase Grammatical Theory, Pinter
Publishers.
S. B. Steever. 1998. The Dravidian Languages. Routledge.
P. Tapanainen, and T. Järvinen. 1997. A non-projective dependency parser. Proceedings of the
5th Conference on Applied Natural Language Processing, pp. 64–71.
L. Tesnière. 1959. Eléments de Syntaxe Structurale. Klincksiek, Paris.
R. Tsarfaty, D. Seddah, Y. Goldberg, S. Kuebler, Y. Versley, M. Candito, J. Foster, I. Rehbein
and L. Tounsi. 2010. Statistical Parsing of Morphologically Ricj Languages (SPMRL) What,
How and Wither. In Proc of NAACL-HLT 2010 workshop on Statistical Parsing of
Morphologically Rich Languages (SPMRL 2010), Los Angeles, CA.
126
R. Tsarfaty and K. Sima'an. 2008. Relational-Realizational Parsing. Proceedings of the 22nd
CoLing. Manchester, UK.
A. Vaidya, S. Husain, P. Mannem, D. M. Sharma. 2009. A karaka-based dependency annotation
scheme for English. In Proceedings of the CICLing-2009, Mexico City, Mexico.
C. Vempaty, V. Naidu, S. Husain, R. Kiran, L. Bai, D. M Sharma, and R. Sangal 2010. Issues in
analyzing Telugu sentences towards building a Telugu Treebank. In Proceedings of CICLing
2010. Iai, Romania. 2010.
S. Venkatapathy, P. Agrawal and A. K. Joshi. 2005. Relative Compositionality of Noun+Verb
Multiword Expressions in Hindi. In Proceedings of ICON-2005, Kanpur, India
.M. K. Verma (ed.). 1993. Complex Predicates in South Asian Languages. Manohar Publications.
New Delhi.
H. Yamada and Y. Matsumoto. 2003. Statistical dependency analysis with support vector
machines. In Proc of the 8th IWPT, Nancy, France, pp. 195-206.
Y. Zhang and S. Clark. 2008. A tale of two parsers: Investigating and combining graph-based and
transition-based dependency parsing. In Proceedings of the Conference on Empirical Methods
in Natural Language Processing (EMNLP), pages 562-571.