A Generalized Parsing Framework Based On Computational...

A Generalized Parsing Framework Based On

Computational Paninian Grammar

A thesis submitted to IIIT-Hyderabad

in partial fulfillment of the requirements

for the degree of

Doctor of Philosophy

in Computational Linguistics

Samar Husain

200522004

{[email protected]}

July, 2011

i

INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY

Hyderabad, India

Certificate

It is certified that the work contained in this thesis, titled ‘A Generalized Parsing Framework based on

Computational Paninian Grammar’ by SAMAR HUSAIN, has been carried out under my supervision and

is not submitted elsewhere for a degree.

Date:

Supervisors: Prof. Rajeev Sangal and Dr. Dipti Misra Sharma

Language Technologies Research Center,

IIIT, Hyderabad

Signed: _______________________

Signed: _______________________

ii

Acknowledgment

I would like to thank my supervisors Dr. Rajeev Sangal and Dr. Dipti Misra Sharma for their constant

support and guidance throughout this work. The work has benefited from the suggestions/criticisms of

anonymous reviewers at various conferences. I would also like to thank past and present research students

(especially, Bharat Ram Ambati, Phani Gadde, Meher Vijay and Pujitha Gade) at LTRC, IIIT-Hyderabad,

who have collaborated with me on different aspects of this work and are co-authors in various papers. I

have also benefitted from discussions with Sudheer Kolachina, Prashanth Mannem, Sriram Venkatapathy,

Joakim Nivre, Owen Rambow, and Rajesh Bhatt. Parts of this work have been presented at various

workshops such as ‘TCS NLP Winter School’ (Dec-Jan, 2007-08, IIIT-Hyderabad), ‘IASNLP08’ (May-

June 2008, IIIT-Hyderabad), ‘CGMIL’ (June 2008, IIIT-Hyderabad), ‘Dependency Parsing Workshop’

(June 2009, Univ. of Colorado, Boulder); discussions, comments and questions by participants at these

talks have also contributed to the improvement of the work.

Thanks are due to my parents for understanding the pressures of completing a PhD and for always being

extremely positive. Their prayers and love were instrumental in getting me through. Many thanks to

Arafat Ahsan, Monis Raja Khan, Ashwini Vaidya and Raina Khare for always being there for me.

Without their friendship and support it would not have been possible to persevere. Special thanks to Anil

Kumar Singh. He was my first mentor at LTRC and was instrumental in my decision to attempt for a PhD

and later for staying the course. Finally, thanks to all the writers/poets who came to rescue on those dark-

futile nights and taught me a thing or two about life.

iii

Abstract

In this work, I present a generalized dependency parsing scheme using the Computation Paninian

Grammar (CPG). This is done by incorporating the grammatical notions of CPG in both constraint based

parsing and data-driven parsing. These are reflected as design decisions, constraint formation, feature

selection, etc. in these parsers.

In the constraint based setup I extend an existing parsing paradigm for CPG to cover additional

language phenomenons. A layered parsing approach is motivated, in particular, two stages based on the

notion of clause is introduced in parsing. I show how different grammatical constructs are parsed at

appropriate stages. Use of different types of constraints such as licensing, eliminative and preferential are

introduced to help negotiate language variability with generic grammatical notions. This setup is then

integrated with the insights from graph-based dependency parsing and labeling for the task of

prioritization. This constraint based system (GH-CBP) is illustrated using Hindi, Telugu and Bangla. It

has been evaluated for Hindi and Telugu.

I then show how the insights gained from building GH-CBP can be applied to data driven approaches.

This is done by (a) incorporating targeted features during the training process, (b) by introducing

‘linguistically constrained’ modularity during the parsing process, and (c) by exploring ‘linguistically

rich’ graph-based parsing. I finally discuss the error analysis and make concluding observations.

iv

Contents

Chapter Page

1. Introduction ……………………………………………………………………………….. 1

1.1. Approach ……………………………………………………………………….. 1

1.2. Brief Outline …………………………………………………………………… 2

2. Dependency Grammar Formalism ……………………………………………….. 5

2.1. Some Dependency Grammar Formalisms ……………………………………… 7

2.1.1. Extensible Dependency Grammar (XDG) ………………………............ 8

2.1.2. Constraint Grammar (CG) ………………………………………............. 8

2.1.3. Functional Generative Description (FGD) ……………………………… 9

2.2. Computational Paninian Grammar (CPG) ………………………………........... 9

3. Dependency Parsing ……………………………………………………….............. 16

3.1. Constraint Based Parsing ……………………………………………………..... 17

3.2. Data-Driven Parsing …………………………………………………………..... 18

3.3. Constraint Based Parsing for Indian languages (CBP) ………………………… 19

4. A Two Stage Generalized Hybrid Constraint Based Parser (GH-CBP) ……….. 22

4.1. Parsing in layers ………………………………………………………………... 22

4.1.1. Chunk as minimal parsing unit ………………………………………...... 23

4.1.2. Clause as minimal parsing unit ……………………………….………..... 24

4.1.2.1.Two Stages ………………………………………………………….. 25

4.1.2.1.1. Status of _ROOT_ ………………………………………….. 29

4.1.2.1.2. Partial Parse …………………………………………………. 29

v

4.1.3. A layered architecture …………………………………………………... 30

4.2. Constraints ……………………………………………………………………... 32

4.2.1. Licensing Constraints ………………………………………………........ 32

4.2.1.1.H-constraints ………………………………………………............... 33

4.2.1.2.Meta-constraints …………………………………………………….. 34

4.2.1.2.1. Feature Unification ………………………………………….. 35

4.2.1.2.2. Demand Status Transformation ……………………….......... 38

4.2.1.2.3. Revision ……………………………………………………... 41

4.2.1.2.4. Look-ahead …………………………………………………. 44

4.2.2. Eliminative Constraints ………………………………………………..... 46

4.2.3. Preferential Constraints ………………………………………………..... 48

4.3. GH-CBP framework …………………………………………………………… 50

4.3.1. Parsing as Constraint Satisfaction ………………………………………. 51

4.3.2. Prioritization …………………………………………………………….. 52

4.3.3. Fail-safe Parse …………………………………………………………… 55

4.3.4. Algorithm ……………………………………………………………….. 56

4.4. Results ………………………………………………………………………….. 57

5. Incorporating Insights from GH-CBP in Data Driven Dependency Parsing ….. 60

5.1. Parsers: Malt and MST ……………………………………………………….... 61

5.2. Data …………………………………………………………………………...... 61

5.3. Incorporating targeted features during training ………………………………… 62

5.3.1. Morphological Features …………………………………………………. 62

5.3.2. Local Morphosyntactic Features ………………………………………… 63

5.3.3. Clausal Features …………………………………………………………. 64

5.3.4. Minimal Semantics Features ……………………………………………. 65

5.3.5. Results …………………………………………………………………… 66

5.4. Linguistically Constrained Modularity ………………………………………… 67

5.4.1. Chunk Based Parsing ………………………………………………….... 67

5.4.1.1. Chunk as Hard Constraint …………………………………………… 67

5.4.1.2. Chunk as Soft Constraint …………………………………………… 68

vi

5.4.1.3. Results ……………………………………………………………..... 69

5.4.2. Clausal Parsing ………………………………………………………….. 70

5.4.2.1. 2-stage parsing ……………………………………………………… 72

5.4.2.2. Two-Stage Parsing with Hard Constraints ………………………..... 73

5.4.2.2.1. Strategy 1 ………………………………………………….... 73

5.4.2.2.2. Strategy 2 ………………………………………………….... 76

5.4.2.2.3. Handling relative clause constructions in

2-Hard-S2 and 2-Hard-S1 ………………………………….. 79

5.4.2.3.Two-Stage Parsing with Soft Constraints …………………………… 79

5.4.2.4.Results ………………………………………………………………. 80

5.5. Linguistically Rich Graph-Based Parsing ……………………………………… 80

5.5.1. Constraint Graph ………………………………………………………… 81

5.5.2. Experimental Setup ……………………………………………………... 82

5.5.3. Experiments …………………………………………………………….. 82

5.5.4. Results …………………………………………………………………... 86

6. Rounding up ……………………………………………………………………….. 87

6.1. GH-CBP ……………………………………………………………………….. 87

6.1.1. Errors ……………………………………………………………………. 88

6.1.1.1.Error analysis of Prioritization …………………………………….... 90

6.2. Data driven parsing …………………………………………………………….. 91

6.2.1. Use of Targeted Features ……………………………………………….. 91

6.2.2. Chunk Based Parsing ………………………………………………….... 95

6.2.3. Clause Based Parsing ………………………………………………….... 96

6.2.4. Errors …………………………………………………………………… 100

6.2.5. Causes of Errors ………………………………………………………... 100

6.3. Linguistically Rich Graph-Based Parsing …………………………………….. 101

6.4. General Observations …………………………………………………………. 102

7. Conclusion ………………………………………………………………………… 105

vii

APPENDIX I : Dependency Tagset ……………………………………………. 106

APPENDIX II : Chunk Tagset ………………………………………………….. 109

APPENDIX III : POS Tagset ……………………………………………………... 110

APPENDIX IV : MaltParser Features ………………………………………….... 111

APPENDIX V : MaltParser Features (2nd

stage parser) ………………………. 113

APPENDIX VI : MSTParser Features ………………………………………….. 114

APPENDIX VII : MaxEnt labeler Features ……………………………………… 115

Bibliography ………………………………………………………………………….. 116

viii

List of Figures

Figure Page

1.1 Paring in layers .……………………………………………………………………. 3

2.1a Phrase Structure ……………………………………………………………………. 5

2.1b Dependency Structure ……………………………………………………………... 6

2.2 Levels of representation/analysis in the Computation Paninian Grammar ……….. 10

2.3 CPG dependency analysis of sentence 2.4 ………………………………………… 14



3.1 Constraint Graph for sentence 3.1 …………………………………………………. 20

3.2 Solution parses for sentence 3.1 ……………………………………………………. 21

4.1a Example 4.1 with POS tags ………………………………………………………... 24

4.1b Example 4.1 with Chunk boundaries ………………………………………………. 24

4.1c Example 4.1 with chunk head and vibhakti features ………………………………. 24

4.2 Chunk heads as dependency tree nodes …………………………………………… 24

4.3a 1st stage output for example 4.1 ……………………………………………………. 26

4.3b 2nd

stage final parse for example 4.1 ……………………………………………….. 26

4.4 Parse outputs for sentence 4.3 ……………………………………………………... 26

4.5 Stage 1 and Stage 2 outputs for sentence 4.3 ……………………………………… 27

4.6 Some inter-clausal structures ………………………………………………………. 28

4.7 Partial parse for sentence 4.5 ..................................................................................... 30

4.8a POS tagged and chunked sentence ………………………………………………… 31

4.8b Partial parse tree after stage 1 of GH-CBP ………………………………………… 31

4.8c Parse tree after stage 2 of GH-CBP ………………………………………………... 31

4.8d Complete parse after intra-chunk dependencies identification ……………………. 31

ix

4.9a Constraint Graph …………………………………………………………………... 32

4.9b Constraint Network ………………………………………………………………... 32

4.9c 2nd

stage CG for sentence 4.7 ……………………………………………………… 33

4.10 CG for example 4.8 …………………………………………………………………37

4.11 CG for example 4.9 ……………………………………………………………….. 37

4.12 Dependency tree for example 4.15 ……………………………………………….. 41

4.13 CG for example 4.15 ……………………………………………………………… 41

4.14 Variable property of coordinating conjunction …………………………………….. 42

4.15 Revision of CG …………………………………………………………………….. 42

4.16 Revision of CG with labels ………………………………………………………… 43

4.17 Revision of CG for example 4.15 ………………………………………………….. 43

4.18 2nd

stage CG for example 4.16 …………………………………………………….. 44

4.19 Look-ahead constraint applied to sentence 4.17 …………………………………… 45

4.20 Look-ahead constraint applied to sentence 4.19 …………………………………… 46

4.21 CG for example 4.19 ……………………………………………………………….. 47

4.22 Possible wrong trees for example 4.19 …………………………………………… 48

4.23 Solution parses for example 4.19 …………………………………………………... 48

4.24 Prioritizing of example 4.19 solution parses ……………………………………….. 49

4.25 Schematic design of GH-CBP ……………………………………………………... 51

4.26 Context over which S-constraints can be specified …………………………………54

4.27 Failsafe parse for example 4.20 ……………………………………………………. 56

5.1 Improvement of different features over MST baseline …………………………… 66

5.2 Chunk as Hard Constraint ………………………………………………………….. 68

5.3 Chunk as Soft Constraint ………………………………………………………….. 69

5.4 Dependency label distribution …………………………………………………….. 70

5.5 Arc length and relation type …………………………………………………………71

5.6 Depth and relation type …………………………………………………………….. 71

5.7a Original Gold input ………………………………………………………………… 73

5.7b 1st stage converted tree …………………………………………………………….. 73

5.8 Stage2 training input. Partial trees converted into a single node …………………... 74

x

5.9 Strategy I (2-Hard-S1) ………………………………………………………………75

5.10 Strategy II (2-Hard-S2).. Input to 2nd

stage is a partial parse ……………………… 76

5.11 2nd

stage initialization using the 1st stage parse shown in Fig. 4a …………………. 77

5.12 Parse output for sentence 5.5 ………………………………………………………. 77

5.13 Parse output and 2nd

stage initialization for sentence 5.6 ………………………….. 78

5.14 Constraint graph for sentence 5.7 ………………………………………………….. 82

5.15 Unlabeled attachment accuracies ….………………………………………………….. 86

6.1 Some intra-clausal non-projective structures ……………………………………… 88

6.2 Improvement of different features over MST baseline …………………………….. 92

6.3 Effect of clausal feature on arc length in MSTParser ……………………………… 94

6.4 LAS at arc-length (1-10) for Baseline, 2-Soft and 2-Hard ....................................... 98

6.5 LAS at depth (1-7) for Baseline, 2-Soft and 2-Hard ................................................ 99

xi

List of Tables

Table Page

2.1 Some salient properties of CPG …………………………………………………… 11

3.1 Basic karaka frame for khaa ‘eat’ …………………………………………………. 19

4.1 Division of relations in the two stages ……………………………………………. 28

4.2 Basic demand frame for Hindi verb de ‘give’ ……………………………………. 33

4.3 Basic demand frame for Telugu verb tin ‘eat’ …………………………………….. 33

4.4 Hindi passive TAM transformation frame …………………………………………. 34

4.5 Transformation frame for kara ……………………………………………………………. 34

4.6 Agreement pattern: Simple declarative TAM …………………………………….... 35

4.7 Agreement pattern: Inabilitative TAM …………………………………………….. 35

4.8 Agreement pattern: Obligational TAM ……………………………………………. 36

4.9 Agreement pattern: Perfective TAM ……………………………………………… 36

4.10 Basic demand frame for Bangla verb khaa ‘eat’ ………………………………….. 36

4.11 Final frame for Telugu verb tin ‘eat’ with inablitative TAM ……………………... 37

4.12 Final demand frame for de ‘give’ after passive transformation ………………….. 39

4.13 Final demand frame for de ‘give’ after kara transformation ……………………… 39

4.14 Transformation frame for te_holo ………………………………………………………. 39

4.15 Final demand frame for Bangla verb khaa ‘eat’ after te_holo transformation ……. 40

4.16 Transformation frame for Telugu TAM tu ……………………………………………... 40

4.17 Final demand frame for Telugu verb tinta ‘eat’ after tu transformation ………….. 40

4.18 Demand frame for subordinating conjuncts ……………………………………… 44

4.19 Grammatical notions handled via Meta-constraints ……………………………… 46

4.20 Basic demand frame for khaa ‘eat’ ……………………………………………….. 47

xii

4.21 Oracle scores with GH-CBP for unprioritized parses …………………………… 58

4.22 Intra-clausal and inter-clausal relation results …………………………………… 58

4.23 Results after various prioritization strategies …………………………..………… 59

5.1 Results for chunk modularity ……………………………………………………... 69

5.2 Overall parsing accuracy …………………………………………………………. 80

5.3 Experiment 1 valid arcs ……………………………………..……………………. 83

5.4 Experiment 2 and 3 valid arcs …………………………………………………… 84

5.5 Experiment 4 valid arcs …………….……………………………………………. 84

5.6 Experiment 5 and 6 valid arcs ……………………………………………………… 85

6.1 Unseen verbs and common argument structure errors ……………………………. 89

6.2 Intra-clausal performance in Hindi ………………………………….……………. 89

6.3 MaxEnt performance for attachment and label identification …….……………… 91

6.4 Most frequent confusions …………………………………………………………. 93

6.5 Effect of clausal features on parser performance …………………………………. 94

6.6 Effect of minimal semantics on some relations …………………………………… 95

6.7 Precision values for UAS with respect to arc length ……………………………….96

6.8 Accuracy for intra- and inter-clausal dependency relations ………………………. 97

6.9 Accuracy for relative clause construction …………………………………………. 97

6.10 Label identification comparison between Baseline and 2-Hard ………………….. 97

6.11 Advantages of different features/methods ………………………………………… 99

6.12 Comparison of baseline and Experiment 8 accuracies ............................................ 101

6.13 Accuracy distribution over POS ............................................................................. 102

1

Chapter 1

1. Introduction

Parsing of natural language text has been explored extensively since the 1990s. Most of the early

parsers were mostly tried for English or other fixed word order languages. Since the past decade

parsing of languages other than English has been taken up by the wider CL/NLP research

community. Parsing these morphologically rich, free word order (MoR-FWO) languages is a

challenging task. Challenges arise due to the non-configurational nature of MoR-FWO languages.

Non-configurationality leads to the complex and distributed nature of syntactic cues necessary to

identify various syntactic relations. There has been a recent surge in addressing parsing in such

languages (for example, Czech, Turkish, Hindi, etc.) (Nivre et al., 2007b; Hall et al., 2007; Nivre

and McDonald, 2008; Nivre, 2009; Tsarfaty and Sima'an, 2008; Seddah et al., 2009; Gadde et al.,

2010; Husain et al., 2009; Eryigit et al., 2008; Goldberg and Elhedad, 2009; Tsafarty et al., 2010;

Mannem et al., 2009b). But inspite of this, parsing accuracies for these languages are still less

when compared to accuracies of a fixed word order language like English.

Constraint based parsers have been known to have the power to capture non-trivial language

generalizations and are well suited for handling the complex phenomenons found in MoR-FWO

languages. On the downside, they suffer from robustness, ambiguity resolution and efficiency

issues (Nivre, 2005b). Data driven parsers, on the other hand, are efficient, deterministic and

quite robust. However, they are generally not good at capturing the complex linguistic

generalizations necessary to parse MoR-FWO languages. In this work, we explore both these

methods and investigate how insights from Computational Paninian grammar (CPG) can inform

both constraint based and data-driven parsing in improving their accuracies and also how these

parsing paradigms can benefit from one another.

1.1 Approach

Our approach to parsing can be stated as the following

2

a) A constraint based approach that incorporates grammatical notions from CPG is used to

build a generalized parser. This parser extends existing constraint based parsing paradigm

for CPG to cover additional language phenomenons. In particular, it introduces two

stages and generalized constraints in parsing. Hindi, Telugu and Bangla have been used

to illustrate the generalized parser. The implemented parser has been tested for Hindi and

Telugu.

b) Insights gained from building the above constraint parser is used in data-driven

dependency parsing.

c) Insights from graph-based dependency parsing and labeling are used to rank the parses

obtained from the constraint parser mentioned in (a).

1.2 Brief outline

In this work we begin by motivating the use of Computational Paninian Grammar (CPG) for

Indian language (IL) dependency parsing. We then discuss important grammatical notions in

CPG. These notions have direct bearing in the design of the proposed parsing framework. These

notions are:

a) Aakaankshaa ‘requirement of a head’

b) Karaka ‘participants in an action’

c) Abhihita ‘marked by the verb’

d) Vibhakti ‘nominal and verbal inflections’

e) Tiganta ‘finiteness’

The proposed generalized hybrid constraint based parser (GH-CBP) uses linguistically

motivated modularity, functional constraints and insights from graph-based parsing and labeling.

Linguistic modularity comes by treating chunks and clauses as minimal parsing units. This leads

to a layered parsing scheme where the parsing task is modularized. Figure 1 succinctly shows

these layers.

3

Figure 1.1. Parsing in layers

(a) POS tagged and chunked sentence,

(b) partial parse tree after stage 1 of GH-CBP (c) parse tree after stage 2 of GH-CBP,

(d) complete parse after intra-chunk dependencies identification

The role of constraint is central to GH-CBP. There are three types of constraints:

a) Licensing constraints (CL)

b) Eliminative constraints (CE)

c) Preferential constraints (CP)

These constraints have distinct functions and are incorporated during the parsing process in

different ways. CL is incorporated as a constraint graph (CG) for a input sentence. A CG

constrains the space of permissible dependency structures that the parser will eventually explore

in order to get a solution. CE are incorporated via integer programming and provide us with the

solution parses. Cp, on the other hand, are used to prioritize the parses and select the best parse.

These constraints get reflected as features that are used in a labeling classification model.

4

The task of labeling itself, along with some notions from graph based parsing, is used to prioritize

the multiple parses. GH-CBP is intended to allow generic, transparent and principled handling of

various syntactic constructions in Indian languages. We illustrate the framework using three

Indian languages (IL), namely, Hindi, Bangla and Telugu. It has been tested for Hindi and

Telugu.

After describing GH-CBP we then try and apply these insights to data driven approaches. This is

done by,

a) Incorporating targeted features during the training process: We systematically explore

which features are crucial during data-driven dependency parsing. Morphological, local

morphosyntactic, clausal and semantic features are tried. Their effectiveness is analyzed and

the best set of features is narrowed down.

b) Introducing ‘linguistically constrained’ modularity during the parsing process: The notions

of chunk and clause that were used in GH-CBP are now used to investigate if they can help in

improving the parsing accuracies. We also investigate what is the most optimal strategy to

incorporate this modularity during the parsing process.

c) Exploring ‘linguistically rich’ graph-based parsing: We integrate a graph based parsing

method with the licensing constraints used in GH-CBP. We investigate if parsing accuracy of

a graph based data driven parser can be improved by providing it a constraint graph instead of

a complete graph during the derivation step. Through a series of experiments we formulate

the most optimal constraint graph that gives us the best accuracy.

Finally we discuss the error analysis of all the approaches. Results show that we are for most part

able to significantly improve the baseline performance. The error analysis points to patterns that

exist in parsing languages such as Hindi and Telugu. The experiments flesh out the parsing

complexity of different language phenomenon.

5

Chapter 2

2. Dependency Grammar Formalism

It has been suggested that MoR-FWO languages can be handled more efficiently using the

dependency based framework than the constituency based one (Hudson, 1984; Shieber, 1985;

Mel'Cuk, 1988; Bharati et al., 1995a). The use of dependency formalism for various NLP/CL

tasks, but especially for parsing (Nivre et al., 2007b) has increased many folds since the last

decade. Consequently, most of the parsers for MoR-FWO languages are dependency based. The

basic difference between a constituent based representation and a dependency representation is

the lack of non-terminal nodes in the latter. Figure 2.1 (a) and (b) respectively shows a simplified

phrase structure and dependency structure for example 2.1.

(2.1) Abhay ate a mango.

Figure 2.1(a). Phrase Structure

6

Figure 2.1(b). Dependency Structure

Formally, for an input S, and a relation set R, a dependency tree is a well-formed labeled

digraph G = (V, A) that is a directed tree originating out of node w0 and has the spanning set V =

VS .

where, S = w0, w1, ….wn , is the set of all words in a sentence

R = {r1, …. rm} is a finite set of possible dependency relation types that can hold

between any two words in a sentence

V ⊆ {w0, w1, ….wn}

VS = {w0, w1, ….wn} is a spanning node set that contains all the nodes in a sentence

A ⊆ V x R x V

If (wi, r, wj) ∈ A, then (wi, r’, wj) ∉ A for all r’ ≠ r. This restriction if not followed will

amount to getting a multi-digraph. Multi-digraphs are the dependency representations

used in multi-stratal dependency theories.

Some of the properties of the dependency tree G1 are:

a) Root property: There does not exist wi ∈ V such that wi → w0 .

b) G always satisfies the spanning property over the words of the sentence, which states that

V = VS .

c) G always satisfies the connectedness property which states that for all wi, wj ∈ V, it is the

case that wi ↔* w0. That is, there exists a path between every word pair when the

direction of the arc is ignored.

1 The notations in this chapter and in the subsequent chapters have been adapted from Kubler et al. (2009).

7

d) G satisfies the single head property, which states that for all wi, wj ∈ V, if wi → wj then

there does not exist wi’ ∈ V such that i’ ≠ i and wi’ → wj .That is, each word in a

dependency tree is a dependent of at most one head.

e) G satisfies the acyclicity property, which states that for all wi, wj ∈ A, if wi → wj then it is

not the case that wj →* wi . That is, the dependency tree does not contain any cycles.

f) G satisfies the arc size property, which state that |A| = |V| - 1.

The properties associated with a dependency formalism such as those associated with

representation, level of analysis, status of word-order, etc. are not sacrosanct. Other than the fact

that most dependency formalism stipulate bi-directional asymmetrical relations between words in

a sentence, the formalisms can differ on myriad fronts. Some of the popular modern dependency

grammars known in the literature are the theory of structural syntax developed by Tesniere

(1959), Word Grammar (WG) (Hudson, 1984, 1990, 2007), Functional Generative Description

(FGD) (Sgall et al., 1986), Dependency Unification Grammar (DUG) (Hellwig, 1986, 2003),

Meaning-Text Theory (MTT) (Mel’cuk, 1988), and Lexicase (Starosta, 1988). In addition,

constraint-based theories of dependency grammar have been quite popular. Some of the popular

ones are Constraint Dependency Grammar (CDG) (Maruyama, 1990; Harper and Helzerman,

1995; Menzel and Schroder, 1998) and its descendant Weighted Constraint Dependency

Grammar (WCDG) (Schroder, 2002), Functional Dependency Grammar (FDG) (Tapanainen and

Jarvinen, 1997; Jarvinen and Tapanainen,1998), which in turn is inspired from Constraint

Grammar (CG) (Karlsson, 1990; Karlsson et al., 1995), and most recently Extensible Dependency

Grammar (XDG) (Debusmann et al., 2004; Duchier and Debusmann, 2001). A synthesis of

dependency grammar and categorial grammar is found in the framework of Dependency

Grammar Logic (DGL) (Kruijff, 2001).

2.1 Some Dependency Grammar Formalisms

Below we briefly discuss some dependency grammar formalisms.

8

2.1.1 Extensible Dependency Grammar (XDG)

Extensible Dependency Grammar (XDG) is a general framework for dependency grammar, with

multiple levels of linguistic representations called dimensions, e.g. grammatical function, word

order, predicate-argument structure, scope structure, information structure and prosodic structure.

It is articulated around a graph description language for multi-dimensional attributed labeled

graphs. An XDG grammar is a constraint that describes the valid linguistic signs as n-

dimensional attributed labeled graphs, i.e. n-tuples of graphs sharing the same set of attributed

nodes, but having different sets of labeled edges. All aspects of these signs are stipulated

explicitly by principles: the class of models for each dimension, additional properties that they

must satisfy, how one dimension must relate to another.

XDG models syntactic analysis as one of the many possible dimensions that are analyzed as a

lexicalized multi-dimensional configuration problem. It is inspired by Topological dependency

grammar (TDG) (Duchier and Debusmann, 2001) and is formulated as its generalization. Each of

these dimensions represents different linguistic description. Each lexical entry simultaneously

constrains all dimensions. XDG describes the well-formedness conditions of an analysis by the

interaction of principles and the lexicon. The principles stipulate restrictions on one or more of

the dimensions, and are controlled by the feature structures assigned to the nodes from the lexicon

(Debusmann et al., 2004).

2.1.2 Constraint Grammar (CG)

Constraint grammar (Karlsson, 1990, 1995) is a language independent formalism for surface-

oriented morphology based parsing of unrestricted text. All relevant structure in this grammar is

assigned from morphology to syntax. The constraints discard as many alternatives as possible, the

optimum being a fully disambiguated sentence with one label for each word, with the condition

that no genuine ambiguities should be obliterated. The goal of the grammar is to yield perfect

analysis. The other important aim of CG is to demonstrate that descriptively reasonable and

practically efficient parsing grammar can be designed that are based on pure surface

generalizations.

9

2.1.3 Functional Generative Description (FGD)

Functional Generative Description (FGD) (Sgall et al., 1986) is a dependency stratificational

grammar formalism that treats the sentence as a system of interlinked layers: phonological,

morphematical, morphonological, analytical (surface syntax) and tectogrammatical (deep syntax).

It not only specifies surface structures of the given sentences, but also translates them into their

underlying representations. These representations (called tectogrammatical representations,

denoted TRs) are intended as an appropriate input for a procedure of semantico-pragmatic

interpretation in the sense of intentional semantics (see Hajičová et al. 1998). Since TRs are, at

least in principle, disambiguated, it is possible to understand them as rendering linguistic (literal)

meaning as opposed to providing figurative meaning.

FGD forms the theoretical basis for the Prague dependency Treebank (PDT) (Hajičová,

2002). PDT has a three-level structure. Morphological layer at the lowest level. The middle

levels with superficial syntactic annotation using dependency syntax, it is called the analytical

level. The highest level of annotation called the tectogrammatical level, or the level of linguistic

meaning is based on FGD. The different layer differ in the type of nodes, labels and the structure

stipulated for the final analysis at that layer. (Hajičová, 2002)

2.2 Computational Paninian Grammar (CPG)

Computation Paninian Grammar (CPG) is a dependency grammatical model proposed by Bharati

et al., (1995a). CPG considers information as central to the study of language. When a writer (or a

speaker) uses language to convey some information to the reader (or the listener), he codes the

information in the language string. Similarly, when a reader (or a listener) receives a language

string, he extracts the information coded in it. CPG is primarily concerned with:

(a) how the information is coded and

(b) how it can be extracted.

Two levels of representation can be readily seen in language use: One, the actual language

string (or sentence), two, what the speaker has in his mind. The latter can also be called as the

meaning. Computation Paninian Grammar has two other important levels: karaka level and

vibhakti level

10

--- semantic level (what the speaker

| has in mind)

.

.

|

--- karaka level

|

|

--- vibhakti level

|

|

--- surface level (written sentence)

Figure 2.2. Levels of representation/analysis in the Computation Paninian Grammar

The surface level is the uttered or the written sentence. The vibhakti level is the level at which

there are local word groups together with case endings, preposition or postposition markers. The

vibhakti level abstracts away from many minor (including orthographic and idiosyncratic)

differences among languages. The topmost level relates to what the speaker has in his mind. This

may be considered to be the ultimate meaning level that the speaker wants to convey. Between

this level and vibhakti level is the karaka level. It includes karaka relations and a few additional

relations such as taadaarthya (or purpose). One can imagine several levels between the karaka

and the ultimate level (shown as a pair of dots between karaka and semantic level in Figure 2.2),

each containing more semantic information. Thus, karaka level is one in a series of levels, but one

which has relationship to semantics on the one hand and syntax on the other.

At the karaka level, we have karaka relations and verb-verb relations, etc. Karaka relations are

syntactico-semantic (or semantico-syntactic) relations between the verbs and other related

constituents (typically nouns) in a sentence. This is the level of semantics that is important

syntactically and is reflected in the surface form of the sentence(s).

CPG treats a sentence as a set of modifier-modified relations. A sentence is supposed to have a

primary modified which is generally the main verb of the sentence. The elements modifying the

verb participate in the action specified by the verb. The participant relations with the verb are

called karaka. The notion of karaka will incorporate the ‘local’ semantics (Rambow et al., 2003)

11

of the verb in a sentence, while also taking cue from the surface level morphosyntactic

information (Vaidya et al., 2009).

It is easy to see that this analysis is a dependency based analysis (Kiparsky and Staal, 1969;

Shastri, 1973), with verb as the root of the tree along with its argument structure as its children.

The labels on the edges between a child-parent pair show the relationship between them. Some of

the salient properties of CPG are shown in Table 2.1.

Properties

Representation Tree

Node Lexical/Phrasal

Dependency label Syntactico-semantic

Non-projective trees Allowed

Layers Surface: Morphological,

Vibhakti: Local Morphosyntax,

Karaka: Dependency

Table 2.1. Some salient properties of CPG

The following grammatical notions in CPG are used later to develop a generalized parsing

framework.

(1) aakaankshaa

The verb’s core requirements in order for it to be meaningful is the aakaankshaa of the verb. It

can be roughly translated as the ‘argument structure’ of the verb. For example, a verb like ‘put’

requires someone to put something somewhere, similarly, a verb like ‘eat’ requires someone to eat

something. As is expected, different verbs have different aakaankshaa.

(2) karaka

Karakas are the participants in the action specified by the verb. These relations as mentioned

earlier are syntactic-semantic in nature, in that they are syntactically grounded but also convey

some meaning. There are six basic karakas, namely;

k1: karta: the most independent participant in the action

12

k2: karma: the one most desired by the karta

k3: karana: instrument which is essential for the action to take place

k4: sampradaan: recipient of the action

k5: apaadaan: movement away from a source

k7: adhikarana: location of the action in time and space

In example 2.2, abhay is the k1 and kemeraa is the k2.

(2.2) abhay ne kemeraa rakhaa hai

‘Abhay’ ERG ‘camera’ ‘keep’ ‘is’

‘Abahy has kept the camera.’

In addition to the above relations many other have been proposed as part of the overall CPG

framework (Begum et al., 2008a; Bharati et al., 2009d). The most important of these relations

have been listed in APPENDIX I.

(3) Abhihita

The notion of abhihita signifies the karaka expressed by a verbal TAM (tense, aspect and

modality). This can be roughly translated as ‘agreement’. So for example, the main verb can point

to the karta karaka by agreeing with it. In example 2.3, the main verb rakhataa hai agrees with

abhay in gender, number and person.

(2.3) abhay t.v rakhataa hai

‘Abhay-M.Sg.3rd’ ‘T.V.’ ‘keeps-M.Sg.3rd’ ‘is-M.Sg.3rd’

‘Abahy keeps a T.V.’

(4) Vibhakti (Sup, Ting)

Vibhakti is an abstract concept used for signifying the case markings on the nouns and the tense,

aspect and modality of verbs. The former is called sup and the later is called ting. In example 2.2,

the ne postposition of abhay and the yaa_hai TAM of rakhaa hai are the sup and ting

respectively.

13

(5) Tiganta

Tiganta is the word in an utterance (sentence) that bears a ting. Tiganta conveys the notion of

finiteness of a verb. It is an important concept as the analysis of a sentence. The search of

different karakas starts with a finite verb and can be thought to be within its scope.

(6) Yogyataa

Yogyataa can be roughly translated as semantic selectional restriction of the verb. For instance, a

verb like ‘eat’ necessitates the presence an ‘animate being’ that has the capacity to eat something

that is ‘eatable’.

Many Indian languages shares a common set of properties (Emeneau, 1956; Krishnamurthi,

1986; Masica, 1993; Steever, 1998; Comrie, 1989). CPG has been used to analyze some of these

successfully (Begum et al., 2008a; Vempty et al., 2010; Husain, 2009; Husain et al., 2010). Some

such properties are:

a) Free word order

b) Rich morphology / Case marking

c) Participles

d) Relative-correlatives

e) Correlation between verbal TAM and subject/object case-marking.

2.4 (a-f) shows an example sentence for Hindi, where (2.4a) shows the words in the canonical

order, and the remaining examples show some of the word order variants of (2.4a).

(2.4) a. malaya ne sameera ko kitaaba dii

‘Malay’ ERG ‘Sameer’ DAT ‘book’ ‘gave’

‘Malay gave the book to Sameer’ (S-IO-DO-V)2

b. malaya ne kitaaba sameera ko dii (S-DO-IO-V)

c. sameera ko malaya ne kitaaba dii (IO-S-DO-V)

d. sameera ko kitaaba malaya ne dii (IO-DO-S-V)

2 S=Subject; IO=Indirect Object; DO=Direct Object; V=Verb; ERG=Ergative; DAT=Dative

14

e. kitaaba malaya ne sameera ko dii (DO-S-IO-V)

f. kitaaba sameera ko malaya ne dii (DO-IO-S-V)

Hindi also has a rich case marking system, although case marking is not obligatory. For

example, in (2.4), while the subject and indirect object are marked explicitly for the ergative3

(ERG) and dative (DAT) cases respectively, the direct object is unmarked for the accusative.

Figure 2.3 gives the CPG dependency analysis of example 2.4(a). In Figures 2.4 and 2.5

respectively CPG analysis for a Telugu example and Bangla example are shown.

Figure 2.3. CPG dependency analysis of sentence 2.4

(2.5) ramadu aapela winnalaikapoiyadu [Telugu]

‘Ram-masculine’ ‘Apple’ ‘eat-could-not’

‘I could not eat an apple’


(2.6) aami aapela khaai [Bangla]

‘I-1st_person’ ‘apple’ ‘eat-1

st_person-present’

‘I ate an apple’

3 Hindi is split-ergative. The ergative marker appears on the subject of a transitive verb with perfect morphology.

15


The parsers described in this work will for most part use CPG analysis (Begum et al., 2008a;

Bharati et al., 2009d). CPG is used because

a) It has been used to analyze various Indian languages such as Hindi, Telugu, Bangla,

Marathi, etc. A parser built using the notions in CPG will automatically benefit from its

grammatical devices in accounting the grammatical structures for these languages.

b) Dependency treebanks for various Indian languages such as Hindi, Urdu, Telugu and

Bangla are being built using CPG. Both constraint based parser and the data-driven parser

learn some parameters from a treebank.

16

Chapter 3

3. Dependency Parsing

Dependency parsing can be broadly divided into grammar-driven and data-driven parsing (Caroll,

2000). Most of the modern grammar-driven dependency parsers parse by eliminating the parses

which do not satisfy the given set of constraints. They view parsing as a constraint-satisfaction

problem. Data-driven parsers, on the other hand, use a corpus to induce a probabilistic model for

disambiguation (Nivre, 2005; and the references therein).

A dependency parsing model M comprises of a set of constraints Γ that define the space of

permissible dependency structures, a set of parameters λ and a parsing algorithm h.

M = (Γ, λ, h)

Γ maps an arbitrary sentence S and dependency type set R to a set of well-formed

dependency trees Gs. Additionally, it could encode more complex mechanism that can further

limit the space of the permissible structures:

Γ = (Σ, R, C)

where, Σ is the set of terminal symbols (here, words), R is the label set, and C is the set of

constraints. Such constraints restrict dependencies between words and possible head of the word

in well defined ways.

For data-driven parsing the learning phase tries to construct the parameter λ. The parameter is

generally learned using an annotated treebank that contains dependency trees. In grammar driven

parsing, λ is either null or uniform. They are not learnt automatically from a treebank. After

defining the parsing model, one needs a parsing algorithm to solve the parsing problem. That is,

given a set of constraints Γ, parameter λ and a new sentence S, how does the system find out the

most appropriate dependency tree G for that sentence

G = h (Γ, λ, S)

17

Based on the type of parsing strategy chosen, h will take on different form.

3.1 Constraint Based Parsing

The grammar-driven constraint based dependency parsing is based on the notion of eliminative

parsing, where sentences are analyzed by successively eliminating representations that violate

constraints until only valid representations remain. One of the first parsing systems based on this

idea is the CG framework (Karlsson, 1990; Karlsson et al., 1995), which uses underspecified

dependency structures represented as syntactic tags and disambiguated by a set of constraints

intended to exclude ill-formed analyses. In CDG (Maruyama, 1990), this idea is extended to

complete dependency structures by generalizing the notion of tag to pairs consisting of a syntactic

label and an identifier of the head node. This kind of modeling is important for many different

approaches to dependency parsing, since it provides a way to reduce the parsing problem to a

tagging or classification problem. This line has been explored by the extended CDG framework

of Harper and Helzerman (1995) and the FDG system (Tapanainen and Jarvinen, 1997; Jarvinen

and Tapanainen, 1998), where the latter is a development of CG that combines eliminative

parsing with a non-projective dependency grammar inspired by Tesniere (1959).

In the eliminative approach, parsing is viewed as a constraint satisfaction problem, where any

analysis satisfying all the constraints of the grammar is a valid analysis. For a fully defined

constraint satisfaction problem, we need to specify the variables, their domains and the set of

constraints that need to be satisfied:

(1) Set of variables: S = w0, w1, w2…wn represents the set of lexical item in a sentence

(2) The domain of variables wi is a set {wj | 1 ≤ j ≥ n and j ≠ i} (the possible heads of a word.)

(3) Set of constraints that define the permissible values for variables.

Constraint satisfaction in general is NP complete, which means that to ensure reasonable

efficiency in practice one has to use controlled heuristics. Early versions of this approach used

local consistency (Maruyama, 1990; Harper et al., 1995), which attain polynomial worst case

complexity by only considering local information in the application of constraints. In the more

recently developed XDG framework (Duchier, 1999, Debusmann et al., 2004), the problem is

negotiated using constraint programming to solve the satisfaction problem defined by the

grammar for a given input string. The XDG framework also introduces several levels of

18

representation, arguing that constraints can be simplified by isolating different aspects of the

grammar such as Immediate Dominance (ID) and Linear Precedence (LP) and have constraints

that relate different levels to each other (Duchier and Debusmann, 2001; Debusmann et al., 2004).

From the point of view of parsing unrestricted natural language text, parsing as constraint

satisfaction can be problematic in two ways. First, for a given input string, there may be no

analysis satisfying all constraints, which leads to a robustness problem. Secondly, there may be

more than one analysis, which leads to a problem of disambiguation. Menzel and Schroder (1998)

extends the CDG framework of Maruyama (1990) with graded, or weighted, constraints, by

assigning a weight w (0.0 ≤ w ≤ 1.0) to each constraint indicating how serious the violation of

this constraint is (where 0.0 is the most serious). In this extended framework, later developed into

WCDG (Schroder, 2002), the best analysis for a given input string is the analysis that minimizes

the total weight of violated constraints. The more recent versions of this approach use a

transformation-based approach, which successively tries to improve the analysis by transforming

one solution into another guided by the observed constraint violations in the current solution.

3.2 Data-Driven Parsing

A data-driven approach to parsing primarily makes use of machine learning from an annotated

data in order to parse new sentences. More precisely such methods are called supervised data-

driven methods. There are two main problem, (a) learning problem, which is the task of learning

a parsing model from a representative sample of structure of sentences (training data), and (b) the

parsing problem (or inference/decoding problem), which is the task of applying the learned model

to the analysis of a new sentence. Consequently, data-driven methods differ in the type of parsing

model, the algorithm used to lead the model from data and the algorithm used to parse a new

sentence. The two major classes of approaches fall either into transition based or graph based

data-driven methods. Transition-based methods start by defining a transition system, for mapping

a sentence to its dependency graph. The learning problem is to induce a model for predicting the

next state transition, given transition history, and the parsing problem is to construct the optimal

transition sequence for the input sentence, given the induced model. Graph-based methods instead

define a space of candidate dependency graphs for a sentence. The learning problem is to induce

a model for assigning scores to the candidate dependency graphs for a sentence, given the

induced model (Kubler et al., 2009).

19

Dependency data-driven parsing was first established by Eisner (1996) using graph-based

methods, the transition-based approach was first explored by Kudo and Matsumoto (2002) and

Yamada and Matsumoto (2003). The two parsing methods that we use in this work are MaltParser

(Nivre et al., 2007a), a transition based system and MSTParse (McDonald et al., 2005b), a graph

based system.

3.3 Constraint Based Parsing for CPG

Constraint based parsing using integer programming has been successfully tried for Indian

languages (Bharati et al., 1993, 1995a, 1995b and 2002). Under this scheme the parser exploits

the syntactic cues present in a sentence and forms constraint graphs based on the generalizations

present. It then translates the constraint graph into an integer programming problem. Bipartite

graph matching is then used for finding the solution. The solutions to the problem provide all

possible parses for the sentence.

As part of this framework, a mapping is specified between karaka relations and postpositions.

In CPG this mapping is given by a grammatical structure called the basic karaka frame4. It

specifies whether a karaka is mandatory or optional and what vibhakti (postposition/suffix) it

would take. Basic karaka frame given in Table 3.1 correctly derives sentence (3.1).

(3.1) baccaa haatha se kelaa khaataa hei

‘child’ ‘hand’ INST ‘banana’ ‘eats’ ‘is’

‘The child eats the banana with his hand.’

karaka vibhakti Presence

karta (k1) 0 mandatory

karma (k2) ko or 0 mandatory

karana (k3) se or dvaara optional

Table 3.1. Basic karaka frame for khaa ‘eat’

This mapping between karakas and vibhakti depends on the verb and its tense, aspect, and

modality (TAM) label. The mapping is represented by two structures: basic karaka frame and

4 The original formulation of CPG uses ‘karaka frame’, the text that will follow will use the term ‘demand frame’

instead.

20

karaka frame transformations. The basic karaka frame for a verb or a class of verbs gives the

mapping for the TAM label called basic. It specifies the vibhakti permitted for the applicable

karaka relations for a verb when the verb has the basic TAM label. Table 3.1 gives the frame for

the verb khaa eat’ when it takes the default TAM label taa hei (which corresponds to present

indefinite). For other TAM labels there are karaka frame transformation rules. Thus, for a given

verb with some TAM label, appropriate karaka frame can be obtained using its basic karaka

frame and the transformation rule depending on its TAM label (Bharati et al. 1995a). For

example, if the verb takes a yaa TAM the transformation rule associated with this TAM will

modify the vibhakti of the karta to ne. With this rule we can account for sentence like (3.2);

(3.2) bacce ne haatha se kelaa khaayaa

‘child’ ERG ‘hand’ INST ‘banana’ ‘ate-PERF’

‘The child ate the banana with his hand.’

A demand group is an element which makes demands, for example verbs make demands for

their karakas through demand frames. These demands are satisfied by source groups. A source

group becomes a potential candidate for a verb only after it satisfies the vibhakti specification as

mentioned in the verb’s karaka frame. This can be shown in the form of a constraint graph. Nodes

of the graph are the word groups and there is an arc labeled by an appropriate karaka relation

from a verb group to a selected source group. In figure 3.1, all such source groups are nouns.

Figure 3.1. Constraint Graph for sentence 3.1.

The constraint graph for sentence (3.1) is shown in Figure 3.1. Note that each arc in a CG not

only has a dependency label associated with it but also has the necessity information of the

relation ([m]andatory or [o]ptional). A parse is a sub-graph of the constraint graph thus formed,

containing all the nodes of the constraint graph and satisfying some conditions. A constraint

graph is converted into an integer programming problem by introducing a variable x for an arc

from node i to j labeled by karaka k in the constraint graph such that for every arc there is a

21

variable. The variables take their values as 0 or 1. Figure 3.2 shows the solution sub-graphs for

sentence (3.1).

(a) Solution 1 sub-graph (b) Solution 2 sub-graph

Figure 3.2. Solution parses for sentence 3.1

22

Chapter 4

4. A Two Stage Generalized Hybrid Constraint Based Parser

(GH-CBP)

As mentioned in chapter 1, we incorporate grammatical notions from CPG to build a generalized

parsing framework. In this chapter we will use following notions from CPG and incorporate them

in a constraint based parsing system to build generalized hybrid constraint based parser (GH-

CBP).

(1) Aakaankshaa ‘requirement of a head’

(2) Karaka ‘participants in an action’

(3) Abhihita ‘marked by the verb’

(4) Vibhakti ‘nominal and verbal inflections’

(5) Tiganta ‘finiteness’

These notions have been explained earlier in chapter 2. Later in this chapter, we will also

incorporate insights from graph-based dependency parsing to rank the parses obtained from GH-

CBP.

4.1 Parsing in layers

Relations that exists between pairs of nodes of a dependency tree can signify various functions. A

verb-noun relation, for example, is different from a relation between noun and its postposition.

Many times one class of relations can be mutually exclusive of the other class. In this section we

will introduce two linguistic domains using which we will classify dependency labels into

different types. The first domain is the notion of chunk which will help us distinguish between

local and non-local dependency relations. On similar lines, the domain of a clause will help us

distinguish between intra-clausal and inter-clausal relations. The distinction of different types of

dependency relations will allow us to cater to each type individually during the parsing process.

23

This modularity seems to be similar in spirit to the work such as (Harris, 1962) and (Joshi and

Hopely, 1999).

4.1.1 Chunk as Minimal Parsing Unit

Chunk in languages such as Hindi, Bangla, etc. captures two things

a) Local Word Group (LWG

b) Local dependencies (for example, relations between an adjective and a noun)

The notion of vibhakti in these languages can be captured using the concept of LWG. In such

languages the case markers on nouns and the tense, aspect and modality (TAM) marker on the

verb are lexicalized and appear as separate words. These case and TAM markers play an

important role in identifying various dependency relations. By making them part of the verb or

the noun chunk we can easily localize such morphosyntactic information.

Such elements inside a chunk that are important for dependency parsing can be made

available through chunk head’s feature structure. In Figure 4.1(c) we see that the case and TAM

marker (ne and yaa_thaa respectively) have been percolated at the chunk level. Head percolation

makes available all the relevant information needed to parse the sentence available at the chunk

head’s feature structure. Morphological features (including agreement features) can be similarly

made available.

(4.1) raama ne khaanaa khaayaa thaa

‘Ram’ ERG ‘food’ ‘eat’ ‘was’

‘Ram ate the food.’

24

Figure 4.1. (a) Example 4.1 with POS tags, (b) With Chunk boundaries,

(c) With chunk head and vibhakti features

Many other local modifications such as adjectives modifying a noun have no effect on the global

dependency structure. Therefore, these elements along with function elements can be made part

of a chunk. In general, all the nominal inflections, nominal modifications (adjective modifying a

noun, etc.) are treated as part of a noun chunk, similarly, verbal inflections, auxiliaries are treated

as part of the verb chunk (Bharati et al., 2006).

In GH-CBP, for languages such as Hindi, Bangla, etc. chunks are treated as minimal parsing

unit and relations in a dependency tree represent relations between chunk heads. The relations

inside the chunks are made explicit in a post-processing step. For agglutinative languages such as

Telugu, chunk does not capture the notion of vibhakti (as the information is available via

suffixes), rather it is used simple to ignore local dependencies. Figure 4.2 shows the dependency

tree for example sentence (4.1). Notice the absence of ne and thaa as they become a part of their

respective chunks. A node in such a tree correspond to a chunk head.

Figure 4.2. Chunk heads as dependency tree nodes.

4.1.2 Clause as Minimal Parsing Unit

Tiganta in CPG conveys the notion of finiteness of a verb. It is an important concept as the

analysis of a sentence and the search of different karakas starts with a finite verb. We posit the

25

notion of a clause to demarcate the scope of a finite verb. This is captured in GH-CBP by treating

a clause as a minimal parsing unit. Similar to what we saw in the previous section, once a

minimal parsing unit has been identified, we can again divide the dependency relations into two

classes but this time on the basis of clause.

4.1.2.1 Two Stages

Treating a clause as a minimal parsing unit in GH-CBP leads to a two-stage analysis of an input

sentence. In the 1st stage only intra-clausal dependency relations are extracted. The 2

nd stage then

tries to handle more complex inter-clausal relations such as those involved in constructions of

coordination and subordination between clauses. To illustrate this let us consider example (4.2).

(4.2) mai ghar gayaa kyomki mai bimaar thaa

’I’ ’home’ ’went’ ’because’ ’I’ ’sick’ ‘was’

‘I went home because I was sick’

Figure 4.3a shows the 1st stage analysis of sentence 4.2. Both the matrix clause mai ghar gayaa

and the subordinate clause are shown parsed with their respective heads attached to _ROOT_.

The subordinating conjunction kyomki is also seen attached to the _ROOT_ and not to the matrix

clause. The dependency tree thus obtained in the 1st stage is partial. In the 2

nd stage the

relationship between the two clauses are identified. The 2nd

stage parse for (4.2) is shown in

Figure 4.3b. Under no condition the 2nd

stage modifies the parse sub-trees obtained from the 1st

stage. The 2nd

stage only tries to establish relations between the clauses thereby giving the

complete dependency analysis.

26

Figure 4.3. (a): 1st stage output, (b): 2

nd stage final parse for example 4.1

As another instance of two stage parsing take example (4.3), a relative-correlative

construction.

(4.3) [ jo ladakaa vahaan baithaa hai ] vaha meraa bhaaii hai

‘which’ ‘boy’ ’there’ ’sitting’ ’is’ ’that’ ’my’ ’brother’ ‘is’

‘The boy who is sitting there is my brother’

(a): Output of Stage 1 (b): Output of Stage 2

Figure 4.4. Parse outputs for sentence 4.3

27

Trees corresponding to the output of 1st stage and 2

nd stage are shown in Figure 4.4. During the 1

st

stage the parser tries to find all the dependency relations of both the finite clauses vaha meraa

bhaai hai and jo ladakaa vahaan baithaa hai. Having done that, in the 2nd

stage it tries to attach

the referent in the main clause to which the co-referent jo in the relative clause corefers. The root

of the relative clause gets attached to this element with an ‘nmod__relc’ (relative clause relation).

Two stage analysis of sentences holds true for languages other than Hindi. Figure 4.5 shows

the two stage analysis for sentence 4.4, a Telugu complement clause construction.

(4.4) wulasi golilu mAnesiMxi ani ramA ceVppiMxi [Telugu]

‘Tulasi’ ‘tablets’ ‘stopped using’ ‘that’ ‘Rama’ ‘told’

‘Rama told that Tulasi stopped using tablets.’

(a) (b)

Figure 4.5. Stage 1 and Stage 2 outputs for sentence 4.3

Coordination of clauses also provides a situation where two stage parsing is useful. Figure 4.6

shows some prototypical inter-clausal structures that are parsed in the 2nd

stage.

28

Figure 4.6. Some inter-clausal structures. T1: Coordination structure, T2: Subordination structure,

T3: Relative clause structure. (CCP is a conjunct chunk, VG is a finite verb chunk, NP is a noun

chunk. ’ccof’ is the conjunct relation, ‘nmod__relc’ is relative clause relation)

Table 4.1 below shows the division of different relations in the two stages.

Stage Relations

Stage I

(Intra-clausal)

i. Argument structure of the finite verb

ii. Argument structure of the non-finite verb

iii. Adjuncts of finite and the non-finite verb

iv. Noun modifications

v. Adjectival modifications

vi. Non-clausal coordination

Stage II

(Inter-clausal)

i. Clausal coordination

ii. Clausal subordination

iii. Clausal complement

iv. Relative-Correlative construction

Table 4.1. Division of relations in the two stages

29

For all the experiments described in this work, the following definition of clause is used:

‘A clause is a group of words containing a single finite verb and its dependents’.

We note here that these dependents cannot themselves be finite verbs. Also, subordinating

conjunctions and finite verb coordinating conjunctions are not treated as part of a clause. And

therefore, a sentence such as ‘John said that He will come late’ has 3 units; (1) John said, (2)

that, and (3) He will come late. Similarly, ‘John ate his food and he went shopping’ has 3 units;

(1) John ate his food, (2) and, and (3) he went shopping.

Let us now define the input to the 2nd stage more precisely. Let T be the complete tree that

should be output by the 2nd

stage parser and let G be the subgraph of T that is input to the 2nd

stage. Then G should satisfy the following constraint: if the arc x → y is in G, then, for every z

such that y → z is in T, y → z is also in G. In other words, if an arc is included in the 1st stage

partial parse, the complete subtree under the dependent must also be included. This constraint

holds true for all the 2nd

stage construction with the exception of relative clauses.

4.1.2.1.1 Status of ‘_ROOT_’

An artificial dummy node named _ROOT_ becomes the head of dependency trees at both the

stages. This is done so that all the categories including verbs are handled in the same way and that

the eliminative constraints (explained shortly) act consistently across all the categories. The only

exception now is _ROOT_ for which we have the constraint that at the end of 2nd

stage it should

have only one outgoing arc (and of course no incoming arc). By introducing _ROOT_ we are able

to attach all unprocessed node to it. _ROOT_ ensures that the output we get after each stage is a

tree.

4.1.2.1.2 Partial Parse

The _ROOT_ node takes all the unattached nodes as its children. If due to some reason the parser

is unable to analyze a clause, the parse for other clauses are produced and those sub-trees are

shown attached to the _ROOT_. Figure 4.7 shows the 1st stage parse for sentence 4.5 where the

first clause raama ghara gayaa is correctly analyzed, but other clause usane so gaya, being

ungrammatical, was not parsed successfully, and attaches directly to the _ROOT_. Since this is a

1st stage parse, aura is also seen attached to the _ROOT_.

30

(4.5) raama ghara gayaa aura usane so gayaa

‘Ram’ ‘home’ ‘went’ ‘and’ ‘he-ERG’ ‘sleep’ ‘went’

‘Ram went home and he slept’

Figure 4.7. Partial parse for sentence 4.5.

4.1.3 A Layered Architecture

It is clear that by treating chunk and clause as minimal parsing units, GH-CBP divides the task of

dependency parsing in layers, wherein specific tasks are broken down into smaller linguistically

motivated sub-tasks.

i. The first subtask is Part of speech tagging and chunking along with morphological

analysis which is treated as a pre-processing step before the task of dependency parsing.

ii. Parse a POS tagged and chunked input in two stages. The parser first tries to extract intra-

clausal dependency relations and builds clausal sub-trees. In the 2nd

stage the clausal sub-

trees are connected to form the complete dependency tree.

iii. Finally, intra-chunk dependencies are identified as a post-processing step to (i) and (ii).

Once this is done, the chunks can be removed and one gets a complete dependency tree.

What is implied here is that the decisions taken at (ii), i.e. establishing the relations

between chunk heads, are more or less independent of the dependencies between words

inside a chunk. As discussed earlier, there are different kinds of dependencies between

elements inside a chunk. For example, a noun-adjective relation is different from a noun-

31

postposition/suffix relation. While establishing relations between chunk heads, the

properties of the head in the form of suffix/postposition/auxiliaries are available and other

chunk elements can be safely neglected in step (ii).

Figure 4.8. (a) POS tagged and chunked sentence,

(b) partial parse tree after stage 1 of GH-CBP (c) parse tree after stage 2 of GH-CBP,

(d) complete parse after intra-chunk dependencies identification

Figure 4.8(a-d) shows the output of each of the previously discussed layers for example (4.6). In

the dependency trees (b) and (c), each node is a chunk head. After removing the chunks in (d)

each node is a lexical item of the sentence.

(4.6) mohana ne tebala para apani kitaaba rakhii Ora vaha so gayaa

’Mohan’ ‘ERG’ ‘table’ ‘on’ ‘his’ ‘book’ ‘kept’ ‘and’ ‘he’ ‘sleep’ ‘PRFT’

‘Mohan placed his book on the table and slept’

32

4.2 Constraints

Constraints in GH-CBP framework perform distinct functions. These constraints are:

a) Licensing constraints (CL)

b) Eliminative constraints (CE)

c) Preferential constraints (CP)

4.2.1 Licensing Constraints (CL)

Licensing constraints (CL) help in forming a constraint graph (CG) for a sentence. A CG

constrains the space of permissible dependency structures that the parser will eventually explore

in order to get a solution. An arc between two words in a CG is added if it is licensed by CL . An

arc in a CG, in addition to providing the relation, also gives the necessity information

([m]andatory or [o]ptional) of the relation. Contrast this with constraint systems (Maruyama,

1990; Harper et al., 1995; Schroder, 2002) that employ constraint propagation and therefore start

with a complete graph, a constraint network, in which the nodes represent the variables (i.e.

words) and arcs the constraints. Figures 4.9 (a),(b) respectively shows the CG and constraint

network for an earlier example (3.1). While Figure 4.9 (c) shows the 2nd

stage CG for the example

sentence 4.7. Note that the 2nd

stage CG will have far less nodes than a 1st stage CG.

(a) Constraint Graph (b) Constraint Network

Figure 4.9

(4.7) bacce ne kelaa khaayaa aura so gayaa

’child’ ERG ‘banana’ ‘eat’ ‘and’ ‘sleep’ ‘went’

‘The child eat the banana and slept’

33

Figure 4.9 (c) 2nd

stage CG for sentence 4.7

Licensing constraints discussed in the later parts of this section encode the grammatical notions of

CPG.

4.2.1.1 H-Constraints

Hard constraints (H-constraints) encode the notion of aakaankshaa in the form of a demand

frame. They also encode the TAM-postposition mapping in a structure called transformation

frame. H-constraints reflect the lexical aspect of the grammar that if violated the sentence

becomes ungrammatical.

Table 4.2 shows a demand frame for a Hindi verb de ‘give’. We say the verb de is a demand

group that makes some demands using its demand frame. A demand frame shows the constraints

associated with the arguments of the demand group. It shows the vibhkati (post-position/suffix),

the category of the argument. It also shows if the argument is ‘mandatory’ or ‘optional’. The

relation field shows that the relevant candidate is a child; alternatively, it could be a parent.

karaka vibhakti Category Presence Relation

karta (k1) 0 noun mandatory child

karma (k2) ko or 0 noun mandatory child

sampradaan (k4) ko noun mandatory child

Table 4.2. Basic demand frame for Hindi verb de ‘give’



karma (k2) 0 or ni noun mandatory child

Table 4.3. Basic demand frame for Telugu verb tin ‘eat’

A demand frame for a verb reflects its argument structure when it occurs with a specific TAM

(tense, aspect and modality) marker. All the demand frames for Hindi verbs have been formed

34

with the taa_hei (present indefinite) TAM. So, the frame shown in Table 4.2 shows the argument

structure of de when it is used as ‘detaa hei’.

In Hindi (as well as other Indian languages such as Bangla and Telugu) change in TAM

marker can affect the vibhakti (nominal sup) of an argument. Such alternations, are encoded in

transformation frames. Table 4.4 shows the transformation frame for passive alternation in Hindi.

The frame shows the revised vibhakti that the arguments should take, it also shows that the

‘presence’ status of the arguments have reversed. Transformation frame have an ‘operation’ field,

in Table 4.4 it is ‘update’. For certain other TAMs such as kara (shown in table 4.5), it can also

be ‘delete’ or ‘insert’. These operations are used by the ‘Demand status transformation meta-

constraint (described in section 4.2.1.2) to handle various verbal alternations.

karaka vibhakti Category Presence Relation Operation

karta (k1) se or

ke_dvaaraa

. optional . update

karma (k2) 0 . mandatory . update

Table 4.4. Hindi passive TAM transformation frame.


vmod FINITE v_fin mandatory parent insert

k1 - - - - delete

k2 . . optional child update

Table 4.5. Transformation frame for kara

4.2.1.2 Meta-Constraints

Meta-constraints allow principled handling of various syntactic constructions while forming a

CG. They are overarching language independent constraints that encode certain linguistic

generalizations. While forming the CG, meta-constraints also use the H-constraints described in

the previous section. Grammatical notions like agreement, control, passives, gapping, verbal

alternations, etc. are accounted via these constraints. There are 4 meta-constraints, they have been

illustrated using Hindi, Telugu and Bangla. While the first two constraints are declarative, the last

two are more procedural in nature.

35

a) Feature unification,

b) Demand status tranformation,

c) Revision,

d) Look-ahead.

4.2.1.2.1 Feature Unification

The feature unification is a generic constraint wherein an arc between two words is added in a

CG if their attribute-value unify. Feature unification can be instantiated in various forms; the

agreement feature unification is one such instantiation. This encodes the notion of abihita from

CPG. The actual manifestation of agreement of a finite verb will vary from one language to

another with its karakas. The agreement varies depending on the TAM of the verb as shown in the

Table 4.6 to 4.9

Language Agreement features

Hindi Nominative k1

{gender, number, person}

Nominative k2

(if non-nominative k1)


Telugu Nominative k1


Bangla Nominative k1

{person}

Table 4.6. Agreement pattern: Simple declarative TAM


Hindi Nominative k2




Bangla Default

{3rd

person}

Table 4.7. Agreement pattern: Inabilitative TAM

36


Hindi Nominative k2




Bangla Default

{3rd

person}

Table 4.8. Agreement pattern: Obligational TAM


Hindi Nominative k2




Bangla Nominative k1

{person}

Table 4.9. Agreement pattern: Perfective TAM

Tables 4.6-4.9 show the agreement patterns in the context of simple declarative, inabilitative,

obligational and perfective TAM in Telugu, Bangla and Hindi. Figure 4.10 shows the use of

agreement constraint for (4.8), a Bangla sentence with simple declarative verb. Similarly Figure

4.11 shows the use of agreement constraint for (4.9), a Telugu sentence with a verb in inablitative

TAM.

(4.8) aami aapela khaai [Bangla]

‘I-1st_person’ ‘apple’ ‘eat-1

st_person_present’

‘I eat an apple.’



karma (k2) 0 or ke noun mandatory child

Table 4.10. Basic demand frame for Bangla verb khaa ‘eat’

37

Figure 4.10. CG for example 4.8

(a) CG without agreement constraint, (b) CG with the constraint.

(4.9) ramadu aapela winnalaikapoiyadu [Telugu]

‘Ram-masculine’ ‘Apple’ ‘eat-could-not’

‘Ram could not eat an apple’



karma (k2) 0 or ni noun mandatory child

Table 4.11. Final frame for Telugu verb tin ‘eat’ with inablitative TAM.

(a) CG without agreement constraint, (b) CG with the constraint.


We see in Figures 4.10 and 4.11 that application of agreement constraint can deem some of the

arcs in a CG illegal thereby removing such arcs. However, use of agreement feature unification as

a CL might not always be possible. Two cases when this happens are

38

a) When certain agreement pattern in a language is not precise enough to isolate a relation; for

example, in Bangla when k1 is in 3rd

person, or in Telugu when k1 is feminine, or in Hindi

when the TAM is simple declarative.

b) Due to robustness issue. For example, if the performance of the morphological analyzer for a

language is not very high or the text is known to have grammatical errors.

In such cases, agreement feature unification can be treated as Cp (Preferential Constraints;

explained in Section 4.2.3).

4.2.1.2.2 Demand Status Transformation

Demand status transformation accounts for various changes that are required in the demand

frames in order to handle phenomenons like verbal alternations, control, gapping, passives,

relative clause, etc. In many IL, these phenomenons have lexical manifestations (in the form of

TAM, etc.). In such cases the basic frame is transformed using this constraint before the frame is

used to construct the CG. It uses a transformation frame and transforms the status of a relation

stipulated in the basic demand frame based on the operation in the transformation frame.

Consider the basic frame of Hindi verb de ‘give’, the passive transformation frame, and kara

participle transformation frame mentioned in section 4.2.1.1. The basic de frame cannot account

for the case marking on nouns in examples 4.10 and 4.11. This is because these sentence use de

with a TAM that is different from the present indefinite TAM used to build the basic frame.

Recall that the basic demand frame is made using the present indefinite TAM.

(4.10) khilaunaa bacce ke dvaaraa abhay ko diyaa gayaa

’toy’ ’child’ GEN ‘by’ ‘Abhay‘ DAT ‘given’ ‘was’

‘The toy was given to Abhay by the child.’

(4.11) bacca abhay ko khilaunaa dekara so gayaa

’toy’ ’Abhay’ DAT ‘toy’ ‘having given’ ‘sleep’ ‘was’

‘Having given the toy to Abhay the child slept.’

For these sentences demand status transformation transforms the basic frame using the

appropriate transformations to get the adequate frames that are then used to build the correct CG.

39

Tables 4.12 and 4.13 show the de frames after the application of the demand status transformation

constraint.


karta (k1) se or

ke_dvaaraa

noun optional child

karma (k2) 0 noun mandatory child


Table 4.12. Final demand frame for de ‘give’ after passive transformation.

The lightly shaded cells show the modifications.


karma (k2) ko or 0 noun optional child


vmod FINITE v_fin mandatory parent

Table 4.13. Final demand frame for de ‘give’ after kara transformation.

The lightly shaded cells show the modifications. The original k1 demand was deleted.

Use of TAM can similarly account for perfective and obligational constructions. Other TAMs

signifying control such as nA, ta_huA, etc. are handled similar to the kara transformation. Table

4.10 earlier showed the basic frame for Bangla verb khaa. In Table 4.14 we show the

transformation frame for obligational TAM te_holo in Bangla. The final frame that accounts for

sentence 4.12 is shown in table 4.15.

(4.12) aamaake aapela khete holo [Bangla]

‘I-Acc’ ‘Apple’ ‘eat’ ‘had to’


karta (k1) ke . . . update

Table 4.14. Transformation frame for te_holo

40


karta (k1) ke noun mandatory child

karma (k2) 0 or ke noun mandatory child

Table 4.15. Final demand frame for Bangla verb khaa ‘eat’ after te_holo transformation

The lightly shaded cells show the modifications.

Similarly, Table 4.17 shows the final frame for the Telugu verb ‘eat’ needed to account for its

participle usage tintu in sentence 4.13. The basic frame for this verb was earlier shown in Table

4.11. The constraint was applied using the participle transformation frame shown in Table 4.16.

(4.13) nenu tintu intiki vellanu [Telugu]

‘I’ ‘while eating’ ‘home-ACC’ ‘went’


vmod FINITE v_fin mandatory parent insert

k1 - - - - delete

k2 . . optional child update

Table 4.16. Transformation frame for Telugu TAM tu


vmod FINITE v_fin mandatory parent

karma (k2) 0 or ni noun optional child

Table 4.17. Final demand frame for Telugu verb tinta ‘eat’ after tu transformation

The lightly shaded cells show the modifications. The original k1 demand was deleted.

Relative clause (in languages such as Hindi, Bangla, etc.) and gapping, on the other hand, are

not analyzed via TAMs. In such cases, the constraint is triggered not because of the TAM but

because of some lexical item. For instance, in the case of gapping shown in example (4.14) the

constraint relaxes the requirements of the second verb so ‘sleep’ and foresees a potential gapping

due to the presence of the lexical item signifying coordinating conjunction aura ‘and’.


’child’ ERG ‘ banana’ ‘eat’ ‘and’ ‘sleep’ ‘went’

‘The child ate the banana and slept’

41

4.2.1.2.3 Revision

In the previous sections we saw how H-constraints along with demand status transformation

helped in forming CGs for different kinds of constructions. So for example, the frames shown in

Tables 4.12 and 4.13 could account for passives and participles respectively. Notice that in all the

H-constraints we saw earlier, the demands are constrained on vibhakti and category. Having

noted this, let us now try to analyze sentence 4.15.

(4.15) raama aura sitaa ne khaanaa khaayaa

’Ram’ ’and’ ’Sita’ ERG ’food’ ‘ate’

‘Ram and Sita ate food.’

The final analysis of example 4.15 is shown in Figure 4.12. We see that the the coordinating

conjunct aura is attached to the main verb with relation ‘k1’. It is clear that we cannot get the

correct CG that can lead us to the parse shown in Figure 4.12 using the frame for khaayaa simply

because the frame does not demand for a ‘k1’ conjunct. Instead, it demands for a ‘k1’ noun with

ne postposition (sitaa satisfies this constraint in this sentence). This CG is shown in Figure 4.13.

Figure 4.12. Dependency tree for example 4.15


42

A coordinating conjunction can potentially take any lexical category as child and bear its

properties. This means that all the heads such as verbs, nouns, adjectives and even conjunctions

can in turn take a coordinating conjunct as their child. Figure 4.14 shows this clearly. The

revision meta-constraint handles all the constructions where a coordinating conjunct is a potential

child of a head in a CG.

Figure 4.14. Variable property of coordinating conjunction.

To accomplish this, the following general principle is followed:

For any node becoming a potential child of a coordinating conjunct, its existing parents can

also become the parent of the conjunct. For example, in Figure 4.15, after node 2 is identified as a

potential child of a coordinating conjunct (node 3) its existing parent 0 also becomes the potential

parent of 3 (shown as dashed arc).

(a) CG before revision (b) CG after revision

Figure 4.15. Revision of CG. Node 3 is a coordinating conjunct.

The above principle is used with slight variations with different heads. In the case of a verbal

head, the label of incoming arc into the conjunct is governed by the vibhakti of the right child of

the conjunct. This can be seen in Figure 4.16.

43


Figure 4.16. Revision of CG with labels. The head of right child (node 2) is a verb (node 0).

Figure 4.17, shows the CG for example (4.15). For ease of exposition we do not show the

necessity information on the arcs. Notice that the incoming arc into aura is labeled k1 due to the

presence of ne postposition in its right child sitaa. Notice also that the left child of the conjunct

was not used to decide the incoming arc label.


Figure 4.17. Revision of CG for example 4.15.

Example 4.16 shows a sentence where a subordination conjunction takes a coordinating conjunct

as its child. For subordinating conjunctions the original formulation of the revision meta-

constraint is used.

(4.16) raama ghar gayaa kyomki use bhook lagi thi aura vaha bimaar

‘Ram’ ‘home’ ‘went’ ‘because’ ‘he’ ‘hunger’ ‘had’ PAST ‘and’ ‘he’ ‘sick’

bhi tha

‘also’ ‘was’

‘Ram went home because he was hungry and was sick as well.’

44

The subordinating conjunction kyomki ‘because’ in the 2nd

stage CG should take a coordinating

conjunct aura ‘and’ as its child. During the 2nd stage CG formation, to begin with, ‘because’ will

only take a single finite clause as its child (cf. Table 4.18), but the revision constraint will ensure

that the CG contains an incoming arc from ‘because’ into ‘and’. Figure 4.18(a) shows the CG

when the revision principle is not applied, while 4.18(b) shows one after it is applied. For

simplicity, the figure only shows the finite verb children and the conjuncts.


ccof FINITE v_fin mandatory child

rh FINITE v_fin mandatory parent

Table 4.18. Demand frame for subordinating conjuncts.

Figure 4.18. 2nd

stage CG for example 4.16

(a) Revision not applied, (b) Revision applied

In Figure 4.18(a), ‘and’ takes V2 and V3 as its potential children, since ‘because’ also takes

V2 as its child, the revision constraint ensures that ‘and’ also becomes a potential child of

‘because’ (Figure 4.18(b)).

4.2.1.2.4 Look-Ahead

Whenever the search for a potential child/parent while forming a CG is greedy there is a chance

of missing out a valid candidate. In a greedy search the first potential candidate that satisfies

some H-constraint is chosen. Greedy search is the default strategy for most demands during 2nd

stage. It is also used in case of non-verbal demand groups (e.g. nominal and adjectival

modifications) in the 1st stage. This strategy although effective and very efficient does not work in

two cases:

45

(a) Head of a paired connectives occurring as a potential child

(b) Nominal with genitive marker occurring as a potential parent/child

Example 4.17 shows a Hindi construction with a paired connective agara-to. A greedy search

for the subordinating conjunction kyomki using the demand frame shown in Table 4.18 will never

allow kyomki taking to as a potential child. In such a case, look-ahead is required to construct the

correct CG. This is shown in Figure 4.19 using a dashed arc. The head of the paired connective

(here to) triggers this constraint.

(4.17) kyomki agara tuma aaoge to mai bhi aaungaa

‘because’ ‘if’ ‘you’ ‘come’ ‘then’ ‘I’ ‘also’ ‘will come’

‘Because if you come, I’ll also come.’

Figure 4.19. Look-ahead constraint applied to sentence 4.17

A similar example for Bangla is given as 4.18

(4.18) kaarona jodi tuni asho taahole aamio aashbo. [Bangla]

‘Because’ ‘if’ ‘you’ ‘come’ ‘then’ ‘I’ ‘will come’

The look-ahead constraint is also needed in cases when nominals with genitive case marker

become potential parent/child. The search of potential candidates for all non-verbal demands in

the 1st stage is also greedy. And therefore, in example 4.19 only billi becomes a potential parent

for the adjectival participle bhaagte hue. Without the look-ahead, getting the correct relation

between bhaagte hue and bacce is not possible. The constraint leads the search till the potential

final head of billi. This is shown in Figure 4.20 (a), (b).

(4.19) abhay ne bhaagte hue billi ke bacce ko pakada liya

‘Abhay’ ERG ‘running’ ‘cat’ GEN ‘child’ ACC catch PAST

‘Abhay caught the running kitten’

46

Figure 4.20. Look-ahead constraint applied to sentence 4.19

(a) CG without look-ahead constraint, (b) CG after look-ahead constraint

Meta-constraints Grammatical notion

Feature unification Agreement

Demand status

transformation

Control, passives, relative clause, verbal alternations,

gapping

Revision Coordinating conjuncts as children

Look-ahead Paired connectives, Genitive chain

Table 4.19. Grammatical notions handled via Meta-constraints.

Table 4.19 summarizes various grammatical generalizations (in Hindi, Urdu, Telugu and

Bangla) accounted by the four meta-constraints. Some constructions such as missing copula in

Bangla and Telugu, and missing genitive case markers in Telugu are still not handled by the

parser. For Telugu these issues have been discussed in Vempaty et al. (2010).

4.2.2 Eliminative Constraints (CE)

Unlike the licensing constraints (CL) discussed in section 4.2.1 that are used to construct a CG,

Eliminative constraint (CE) are used by the parser to get the solution parse by eliminating the arcs

in the CG that violate these constraints. There are 4 such constraints:

47

i. For each of the mandatory karakas in a demand frame for each head, there should be

exactly one outgoing edge labeled by the karaka from the head,

ii. For each of the desirable or optional karakas in a demand frame for each head, there

should be at most one outgoing edge labeled by the karaka from the head,

iii. There should be exactly one incoming arc to a child.

iv. There should be no cycles between two nodes.

These constraints ensure that the final parse is a tree. But more than this they connect with the

H-constraints (via a Constraint graph). If that were not so, then the solutions obtained would be

spurious. Consider again sentence 4.1 repeated below as 4.19

(4.19) baccaa haatha se kelaa khaataa hei

’child’ ‘hand’ INST ‘banana’ ‘eats’ ‘is’

‘The child eats the banana with his hand.’

Figure 4.21 shows the constraint graph (CG) for sentence 4.19 and shows all the potential

candidate nodes that the verb (khaa) demands. The demand frame required to form the CG in

Figure 4.21 is shown in Table 4.20. A parse is a sub-graph of the CG formed, containing all the

nodes of the CG and satisfying CE.

Figure 4.21. CG for example 4.19.

karaka vibhakti Presence

karta (k1) ne mandatory

karma (k2) ko or 0 mandatory

karana (k3) se or dvaara optional

Table 4.20. Basic demand frame for khaa ‘eat’

48

Although, one can easily derive possible trees from the CG shown above, some of them as

given in Figure 4.22 are clearly wrong parses and show the disconnect between the lexical

demands and the tree well-formedness assumption. In effect, CE ensures that the demand frame

requirements align with a candidate source element during derivation. The correct solutions for

the above sentence are shown in Figure 4.23.

Figure 4.22. Possible wrong trees for example 4.19

(Notice the multiple presence of same labels that make the parses wrong)

Figure 4.23. Solution parses for example 4.19.

Figure 4.23 shows that there can be multiple parses which can satisfy these CE. This indicates the

ambiguity in the sentence if only the limited knowledge base is considered. Stated another way,

H-constraints and the meta-constraints that help in constructing the CG are insufficient to restrict

multiple analysis of a given sentence and that more knowledge (semantics, other preferences,

etc.) is required to curtail the ambiguities. For all such constructions there are multiple parses.

4.2.3 Preferential Constraints (CP)

Soft-constraints (S-constraints) are those constraints which are used in a language as preferences.

They reflect preferences that a language has towards various linguistic phenomena. These

49

preferential constraints are used to prioritize the parses and select the best parse. Some of the S-

constraints that have been tried out are:

a) Order of the arguments,

b) Relative position of arguments with respect to the verb,

c) Agreement,

d) Distance from head,

e) General graph properties.

Figure 4.24 shows how order of arguments can be used as a prioritization strategy. Here the two

parses of sentence 4.19 earlier shown in Figure 4.23 are prioritized using the order of ‘k1’ and

‘k2’, where (k1, k2) is preferred over (k2, k1). Such S-constraints can be used for ranking by

penalizing a parse for the constraint that it violates and finally choosing the parse that gets least

penalized. This strategy is similar to the one used in weighted constraint grammar parsing

(WCGP) (Schröder 2002; Foth and Menzel, 2006), and Optimality Theory (Prince and

Smolensky, 1993; Aissen, 1999).

Figure 4.24. Prioritizing of example 4.19 solution parses.

Prioritization based on the order of ‘k1’ and ‘k2’. (k1, k2) is preferred over (k2, k1)

The tree inside the rectangle is the correct parse.

50

The other way is to use these constraints as features, learn their associated weights from a

dependency treebank, use them to score the output parses and select the parse with the best score.

This is similar to the work on ranking in phrase structure parsing (Collins and Koo, 2005; Shen et

al., 2003) and graph-based parsing (McDonald et al., 2005a). We use this latter strategy. We note

here that the constraint weights used in WCGP are determined by a grammar writer.

4.3 GH-CBP Framework

Figure 4.25 shows the overall parsing scheme of GH-CBP. The GH-CBP model MGH-CBP

comprises of a dependency grammar Γ, a set of parameters λ and a parsing algorithm h.

MGH-CBP = (Γ, λ, h) (I)

Γ maps an arbitrary sentence S to a constraint graph CG. CG constrains the space of

permissible dependency structures for a given sentence.

Γ = (Σ, R, C) (II)

C = {CL, CE , Cp} (III)

CL = {H-constraints, Meta-constraints} (IV)

CP = {S-constraints} (V)

Σ is the set of words, R is the label set, and CL is the set of licensing constraints that are used

to construct a CG. We saw them earlier in the form of H-constraints and the Meta-constraints. CE

is the set of eliminative constraints that are employed by the parsing algorithm. They are

discussed in the next section.

As seen in Section 4.3.2, λ(i, r, j) is the probability of relation r on arc i → j given a set of

preferential constraints Cp. This probability is automatically learnt from a dependency treebank.

We elaborate on this further in section 4.3.2.

λ(i, r, j) = P(ri,j | CP) (VI)

51

Figure 4.25. Schematic design of GH-CBP.

4.3.1 Parsing as Constraint Satisfaction

In a constraint dependency approach, parsing is defined as a constraint satisfaction problem. For a

fully defined constraint satisfaction problem, we need to specify the variables, their domains and

the set of constraints that need to be satisfied:

(1) Set of variables: X=x0, x1, x2…xe represents an arc in the constraint graph (CG), where e ≤

k(n2 – n) and n is the number of vertices, k is the number of dependency labels.

(2) Domain of variables: 0 or 1.

(3) Set of eliminative constraints: CE

In GH-CBP constraint satisfaction is done using bipartite graph matching and Integer

programming (IP) (Bharati et al., 1995a, 1995b, 1993). This is done by associating a variable with

every edge in the CG (say, xikj for an edge from node i to node j labeled by k). A parse is an

52

assignment of 1 to those variables whose corresponding arcs are in the parse sub-graph, and 0 to

those that are not. The cost function to be minimized is the sum of all variables. The following IP

equalities and inequalities ensure the four constraints stated above:

For each head ‘i’, for each of its mandatory karakas ‘k’

Mi;k : ∑j xi;k;j = 1 (VII)

For each head ‘i’, for each of its optional or desirable karakas ‘k’

Oi;k : ∑j xi;k;j ≤ 1 (VIII)

For each of the child ‘j’

Sj : ∑i;k xi;k;j = 1 (IX)

For each head ‘i’ and its potential child ‘j’

Ci*j: xi*j + xj*i = 1 (X)

Note that Mi;k stands for the equation formed, given a head ‘i’ (the verb khaa in Ex. 4.18) and

karaka ‘k’. Thus, there will be as many equations as combinations of ‘i’ and ‘k’. Also, Ci*j is

formed even if the relation between node ‘i’ and node ‘j’ is indirect.

4.3.2 Prioritization

It was clear from section 4.2.1 and 4.2.2 that the core parser that uses CL and CE can produce

multiple parses. We noted earlier that we will use CP (or S-constraints) as features, associate

weight with them, use them to score the output parses and select the parse with the best score.

The score of a dependency parse tree t=(V, A) in most of the graph-based parsing systems

(Kubler et al., 2009) is

Score(t) = Score(V, A) ∈ R (XI)

where, V and A are the set of vertices and arcs. This score signifies how likely it is that a

particular tree is the correct analysis of a sentence S. Many systems assume the above score to

factor through the scores of subgraph of t. Thus, the above score becomes

53

Score(t) = Σα ∈ αt λα (XII)

where, α is the subgraph, αt is the relevant set of subgraph in t and λ is a real valued

parameter. If one follows the arc-factored model for scoring a dependency tree (Kubler et al.,

2009) like we do, the above score becomes:

Score(t) = Σ (i, r, j) ∈ A λ(i, r, j) (XIII)

In (XIII) the score is parameterized over the arcs of the dependency tree. Since we are

interested in using this scoring function for ranking, our ranking function (R) should select the

parse that has the maximum score amongst all the parses (Φ) produced by the core parser.

R(Φ, λ) = argmax(t=V,A) ∈T Score (t) = argmax(t=V,A) ∈ T Σ (i, r, j) ∈ A λ(i, r, j) (XIV)

One of the most basic methods that we tried for ranking was based on voting. In this method

that we call ‘Rank-Voting’, λ(i, r, j) correspond to the count of relation r on arc i → j amongst all

the parses (Φ).

In all other methods λ(i, r, j) represents probabilities, therefore it is more natural to multiply the

arc parameters instead of summing them.

R(Φ, λ) = argmax(t=V,A) ∈ T Score (t) = argmax(t=V,A) ∈ T Π (i, r, j) ∈ A λ(i, r, j) (XV)

Here, λ(i, r, j) is simply the probability of relation r on arc i → j given some preferential

constraints (CP). This probability is currently obtained using MaxEnt model (Ratnaparkhi. 1998).

So,

λ(i, r, j) = P(ri,j | CP) (XVI)

If A denotes the set of all dependency labels and B denotes the set of all S-constraints then

MaxEnt ensures that p maximizes the entropy

H(p) = - Σ x ∈ E p(x) log p(x) (XVII)

54

Where x = (a,b), a ∈ A, b ∈ B and E = A x B. Note that, since we are not parsing but

prioritizing, unlike the arc-factored model where the feature function associated with the arc

parameter consists only of the features associated with that specific arc, our features can have

wider context. Figure 4.26 shows the context over which various S-constraints can be specified

(S-constraints were described in section 4.3). These S-constraints get reflected as features that are

used in MaxEnt. The features for which the model gave the best performance are given below.

Note that the actual feature pool was much larger, and some features like that for agreement did

not get selected.

(1) Root, POS tag, Chunk tag, suffix of the current node and its parent

(2) Suffix of the grandparent, Conjoined suffix of current node and head

(3) Root, Chunk Tag, Suffix, Morph category of the 1st right sibling

(4) Suffix, Morph category of the 1st left sibling

(5) Dependency relations between the first two, right and left sibling and the head

(6) Dependency relation between the grandparent and head

(7) Dependency relation between the current node and its child

(8) A binary feature to signify if a k1 already exist for this head

(9) A binary feature to signify if a k2 already exist for this head

(10) Distance from a non-finite head

Figure 4.26. Context over which S-constraints can be specified. Node i is the parent of node

j. l-s1 corresponds to 1st left sibling, r-s1 corresponds to 1

st right sibling, gp is grandparent of

node j, ch is child of node j. r1-r6 are dependency relations

55

The ranking function shown in (XVI) can differ based on how one gets the probability of relation

on arc i → j. Since we are ranking labeled dependency tree, the first way (as shown in XVI) is to

use the probability of the label r in the labeled dependency parse. But we can also use the

probability of the label on arc i → j as predicted by the MaxEnt model. Considering this, the third

obvious way is to take the weighted average the two method. (XVIII) and (XIX) show these other

two options.

λ(i, r, j) = P(rmi,j | Cp) (XVIII)

where, rm is the relation on arc i → j predicted by the model.

λ(i, r, j) = ( P(ri,j | Cp) + P(rmi,j | Cp) )/2 (XIX)

When the ranker uses (XVI) we call it ‘Ranking with Parser Relation probability’ (Rank-

PR), the other two are called ‘Ranking with Model Relation probability’ (Rank-MR) and

‘Ranking with Weighted Relation probability’ (Rank-WR).

In similar vein, λ(i, r, j) in (XV) can correspond to the probability of attachment i → j being a

valid attachment. In this formulation, the relation r becomes inconsequential. This can be

ascertained by using MaxEnt as a binary classifier. The ranking model based on this parameter is

called ‘Ranking with Attachment’ (Rank-Attach).

Further, we can combine Rank-Attach with the models described based on (XVI), (XVIII)

and (XIX). In such a strategy we first select the k-best parses using Rank-Attach and then re-rank

them using Rank-PR, Rank-MR or Rank-WR. This order can also be reversed.

4.3.3 Fail-Safe Parse

It is also possible that the core parser might not be able to give any parse for a sentence. In such a

case, some simple heuristics are used in an attempt to produce a reasonable parse. One such

heuristic (shown in Figure 4.27) is to attach all the nominals to the first finite verb occurring to

the right with an underspecified ‘vmod’5 relation.

5 verb modifier

56

(4.20) raama ghara gayaa aura usane so gayaa

‘Ram’ ‘home’ ‘went’ ‘and’ ‘he-ERG’ ‘sleep’ ‘went’

‘Ram went home and he slept’

Figure 4.27. Failsafe parse for example 4.20

4.3.4 Algorithm

GH-CBP runs in two stages. Parsing during both the stages is identical except for the CG formed

during the two stages. CG construction in both the stages is essentially same except that the CG in

the second stage has very few nodes. Also, the heads in the second stage are different from the 1st

stage. Based on different linguistic demands there are 4 types of nodes:

1. Nodes that look for their children, for example verb, coordinating conjunctions

2. Nodes that look for their parent, for example, adverbs, adjectives, relative clause markers,

etc.

3. Nodes that look for both parent and children, for example, subordinating conjunctions,

non-finite verbs, and

4. Nodes that do not look for either parent or children, for example, nouns.

Below we give the algorithm for GH-CBP and for the construction of a CG.

GH_CBP(S, Γ, λ)

CG = Construct_CG (S, Γ)

Φ = Constraint_Statisfaction (CG)

If(Φ)

P = Rank (Φ, λ)

57

Else

P = Fail_Safe(CG)

Construct_CG(S, Γ)

CG = S

foreach w ∈ S

If(IsHead(w))

Frames = H-constraint(w)

foreach Frame ∈ Frames

Frame = Demand_status_transformation(Frame)

CG = Find_children(w, Frame, CG, S)

CG = Look_ahead(w, Frame, CG, S)

CG = Revision(CG)

Currently, only the final parses are ranked. On an average the parser gives around 50 parse

for each sentence. We found that the oracle accuracy with a limit of 300 parse is almost

equivalent to considering all the parse outputs. The efficiency of the parser is dependent primarily

on the following two factors:

a) Number of head and demand frame combinations

b) Search space for a head

It is easy to see that there is a tradeoff between efficiency and performance.

4.4 Results

We evaluated GH-CBP6 for Hindi and Telugu. We used the treebank data from ICON2010 tools

contest (Husain et al., 2010). For Hindi, the training set had 3000 sentences, development and test

set had 500 and 321 sentences respectively. The coarse-grained tagset (see APPENDIX I) was

used during evaluation. For Telugu, the training set had 1500 sentences, development and test set

had 150 sentences each. Performance is shown in terms of unlabeled attachments (UAS), labeled

6 GH-CBP (version 1.5)

58

(LA) and labeled attachment (LAS) accuracy7. In Table 4.21, GH-CBP’’ gives the oracle score

when first 300 unprioritized parses are considered. The oracle parse for a sentence is the best

available parse amongst all the parses of that sentence. It is obtained by selecting the parse closest

to the gold parse. The oracle accuracy gives the upper-bound of the parser accuracy and gives

some idea about its coverage.

Hindi Telugu

UAS LAS LA UAS LAS LA

GH-CBP’’

(Oracle parse)

88.50 79.12 80.96 84.14 65.33 66.60

Table 4.21. Oracle scores with GH-CBP for unprioritized parses (k=300)

Hindi Telugu


Intra-clausal 88.68 77.27 79.54 85.11 57.92 59.87

Inter-clausal 87.78 86.23 86.2 82.09 79.63 79.63

Table 4.22. Intra-clausal and inter-clausal relation results

We see that the oracle UAS and LAS shown in table 4.21 are very good and it shows the

coverage of GH-CBP is high. Table 4.22 shows that the intra-clausal LAS in both Hindi and

Telugu is less than the inter-clausal LAS. We will see in chapter 6 that this is mainly because of

unavailability of frames for various heads.

Table 4.23 shows the accuracies after prioritization. The best parse accuracies are less than

the oracle scores. This is because of ranking errors. Rank-WR gives the best accuracy for Hindi.

As discussed in the section 4.3.2, in Rank-WR we score the parse using the average of parse

relation probability and the model relation probability. Use of Rank-Attach alone or as a

reranking strategy over Rank-WR did not improve the accuracy. The average score of Rank-WR

and Rank-Attach however gave the best accuracy for Telugu. One reason why we think Rank-

Attach fails to outperform Rank-WR is that the Rank-WR model is richer than Rank-Attach.

Considering this, it is interesting to notice that for Telugu, Rank-Attach does help, in fact, the best

model for Telugu involves Rank-Attach. We think this is mainly because of Telugu treebank

being smaller than Hindi. Additionally, owing to the lower average sentence length in Telugu, the

7 LAS/UAS/LA = percentage of words assigned correct head+label/head/label

59

total number of attachments and labels available for training MaxEnt is also less. In such a

scenario an underspecified model such as Rank-Attach seems like a better option than a rich

model like Rank-WR. We analyze prioritization results further in chapter 6.

Hindi Telugu


Rank-Voting 83.96 66.12 69.45 81.18 55.60 57.72

Rank-PR 84.07 71.40 74.87 82.24 59.20 61.10

Rank-MR 83.54 67.01 70.51 81.40 56.03 58.56

Rank-WR 84.25 71.43 74.97 82.03 59.20 60.89

Rank-Attach 83.65 68.85 71.82 83.51 57.08 58.56

Rank-WR (k-best=100) +

Rank-Attach

83.79 69.24 72.25 83.51 57.08 58.56

Rank-WR (k-best=100) +

Rank-Attach (if > 1 parse with best score)

84.25 71.43 74.97 82.03 59.20 60.89

Rank-Attach (k-best=100) +

Rank-WR

84.21 71.40 74.94 82.03 59.20 60.89

Average (Rank-WR + Rank-Attach ) 83.89 70.58 73.66 83.30 59.62 61.10

Table 4.23. Results after various prioritization strategies8

8 Other classification methods such as CRF, SVM were also tried out but MaxEnt gave us the best

accuracy. The scoring function with probability sum (XIV) gave us better result than probability product

(XV) for the development data and therefore (XIV) is used to report the results on test.

60

Chapter 5

5. Incorporating Insights from GH-CBP in Data Driven Dependency

Parsing

Data driven parsing for MoR-FWO languages (such as Czech, Hindi, Turkish, etc.) has not

reached the performance obtained for English (Nivre et al., 2007a; Hall et al., 2007; Husain 2009;

Tsafarty et al., 2010). This low performance can be broadly attributed to

(a) Non-configurational nature of these languages

(b) Inherent limitations in the parsing algorithms

(c) Less amount of annotated data

There have been many attempts to tackle (a) and (b). Some such recent works are (Nivre and

McDonald, 2008; Zhang and Clark, 2008; Nivre, 2009; Tsarfaty and Sima'an, 2008; Gadde et al.,

2010; Husain et al., 2009; Eryigit et al., 2008, Goldberg and Elhedad, 2009, Martins et al., 2009;

Koo and Collins, 2010). The work desribed in this chapter also addresses similar issues. We

introduce the insights from building GH-CBP in data-driven parsing and investigate its effects on

parser performance. This will be done in the following ways:

1. Incorporating targeted features during training (Section 5.3): Constraints that have

proven to be crucial for identifying various dependency relations in GH-CBP are identified

and used as appropriate features. These features can be broadly divided into four classes:

a. Morphological

b. Local morphosyntactic

c. Clausal

d. Minimal semantics

2. Linguistically constrained modularity (Section 5.4): This is done by using chunk and

clause as the basic parsing unit. Different ways are explored to incorporate chunks and

clauses during the parsing process. Broadly, the notion of a chunk or a clause can be used

61

during parsing as something that is fixed vs. as something that provides some extra

information. In other words, they can be treated either as hard constraint or as soft constraint.

3. Linguistically rich graph-based parsing (Section 5.5): In MSTParser (McDonald et al.,

2005b, a graph based data driven parser), a complete graph is used to extract a spanning tree

to get the final parse. In GH-CBP, a constraint graph (CG) is a structure that shows all

possible relations between heads and their children. The CG is used to get the output parse. In

this work, we make MSTParser use CG instead of a complete graph.

5.1 Parsers: Malt and MST

For the experiments reported in this chapter we use two data-driven parsers MaltParser (Nivre et

al., 2007a), and MSTParser (McDonald et al., 2005b).

Malt uses arc-eager parsing algorithm (Nivre, 2006). History-based feature models are used for

predicting the next parser action (Black et al., 1992). Support vector machines are used for map-

ping histories to parser actions (Kudo and Matsumoto, 2002). It uses graph transformation to

handle non-projective trees (Nivre and Nilsson, 2005a).

MSTParser uses an implementation of Chu-Liu-Edmonds (Chu and Liu, 1965; Edmonds,

1967) algorithm to find maximum spanning tree. It uses online large margin learning as the

learning algorithm (McDonald et al., 2005a). Both Malt and MST allow for arbitrary combination

of features as part of the feature model.

Unless otherwise specified, in all the experiments is this section, for MaltParser9, we use arc-

eager and SVM training. For MSTParser10

we use the non-projective algorithm, order=2 and

training k=5. The feature files given in APPENDIX IV and VI are used for Malt and MST

respectively. The other available parser settings in MaltParser and MSTParser were also

experimented with but they fared worse than the above settings.

5.2 Data

All the experiments are conducted on Hindi. We use the dependency treebank released as part of

the ICON2010 tools contest (Husain et al., 2010). The treebank data uses the CPG framework for

annotation (Begum et al., 2008a). The analysis of a sentence is a dependency tree with syntactico-

9 MaltParser (version 1.3.1) 10 MST Version 0.4b

62

semantic labels. The annotation scheme allows for non-projective trees and apart from

dependency annotation also has POS, chunk and morphological information. The training data

had 2,973 sentences, development and training had 543 sentences and 321 sentences respectively.

5.3 Incorporating Targeted Features During Training

In chapter 3 we saw constraints that were used to account for different types of relations in Hindi,

Telugu and Bangla. These constraints incorporated important grammatical notions from CPG in

GH-CBP. These constraints can be used as linguistic features in data-driven parsing. In this

section we will discuss all such features that help improve the parsing accuracy.

5.3.1 Morphological Features

Throughout the previous chapter we noted the importance of various morphological features in

the task of dependency parsing. Incorporating these features is the obvious first step. Morph

output has the following information

a) Root: Root form of the word

b) Category: Course grained POS·

c) Gender: Masculine/Feminine/Neuter

d) Number: Singular/Plural

e) Person: First/Second/Third person

f) Case: Oblique/Direct case

g) Vibhakti: Vibhakti of the word

Take raama in example 5.1, its morph information comprises of root = ‘raama’, category =

‘noun’, gender= ‘masculine’, number = ‘singular’, person = ‘third’, case = ‘direct’, vibhakti = ‘0’.

Similarly, khaayaa ‘ate’ has the following morph information. root = ‘khaa’, category = ‘verb’

gender =‘masculine’, numer = ‘singular’, person = ‘third’, case = ‘direct’, vibhakti = ‘yaa’.

Through a series of experiments, the most crucial morph features were selected. Root, case and

vibhakti turn out to be the most important features. Note that the agreement features (such as

gender, number and person) were not selected in the best setting. This anomaly is discussed in the

next chapter.

63

(5.1) raama ne ek seba khaayaa

’Ram’ ERG ‘one’ ‘apple’ ‘ate’

‘Ram ate an apple’

5.3.2 Local Morphosyntactic Features

Local morphosyntactic features correspond to all the parsing relevant local linguistic features.

This we saw earlier was captured via the notion of chunk where the nominal and verbal vibhakti

were percolated to the head of the chunk. The features that are used to encode local

morphosyntax are:

a) Type of the chunk

b) Head/non-head of the chunk

c) Chunk boundary information

d) Distance to the end of the chunk

e) Vibhakti computation for the head of the chunk

In example 5.2 there are two noun chunks and one verb chunk. raama and seba are the heads

of the noun chunks. khaayaa is the head of the verb chunk. We follow standard IOB11

notation

for chunk boundary. raama, eka and khaayaa are at the beginning (B) of their respective chunks.

ne and seba are inside (I) their respective chunks. raama is at distance 1 from the end of the

chunk and ne is at a distance 0 from the end of the chunk. Given a chunk and morph feature like

vibhakti for individual words inside that chunk, relevant features for the head of the chunk can be

computed automatically. The vibhakti feature of the head can, for example, represent the

postposition/case-marking in the case of noun chunk, or it may represent the tense, aspect and

modality (TAM) information in the case of verb chunks. Take (5.2) as a case in point:

(5.2) (raama/NNP ne/PREP)_NP (seba/NN)_NP (khaa/VFM liyaa/VAUX)_VGF

‘Ram’ ERG ‘apple’ ‘eat’ PRFT

‘Ram ate an apple’

The vibhakti computation for khaa, which is the head of the VGF chunk, will be ‘0_yaa’ and

is formed by concatenating the vibhakti of the main verb khaa with that of its auxiliary liyaa.

11 Inside, Outside, Beginning of the chunk

64

Similarly, the vibhakti computation for raama, which is head of the NP chunk, will be ‘ne’. This

feature turns out to be very important. This, as was discussed in chapter 2, is because in Hindi

(and many other Indian languages) there is a direct linguistic correlation between the verb

vibhakti and the noun vibhakti (case and postpositions) that appears on k1 or k2. This was again

mentioned using one of the meta-constraints in section 4.2.1.2.2.. In (5.2), for example, khaa

liyaa together, provide the past perfective aspect for the verb khaanaa ‘to eat’. Since, Hindi is

split ergative, the subject of the transitive verb takes an ergative case marker when the verb is past

perfective. Similar correlation between the case markers and TAM have been discussed earlier.

5.3.3 Clausal Features

Clause was motivated as a linguistically meaningful minimal parsing unit in chapter 3. Clausal

features were used to incorporate the notion of tiganta. We posited the notion of a clause to

demarcate the scope of a finite verb. It is evident from our previous discussions that most of the

dependents of words in a clause appear inside the same clause; in other words the dependencies

of the words in a clause are mostly localized within the clause boundary (more on this in section

5.4.2).

In the dependency parsing task, a parser has to disambiguate between several words in the

sentence to find the parent/child of a particular word. Clausal features can help the parser to

reduce the search space when it is trying to do this. The search space of the parser can be reduced

by a large extent if we solve a relatively small problem of identifying the clauses. Interestingly, it

has been shown recently that most of the non-projective cases in Hindi are inter-clausal (Mannem

et al., 2009a). Identifying clausal boundaries, therefore, should prove to be helpful in parsing non-

projective structures. The same holds true for many long-distance dependencies. For this set of

experiment, two clausal features were used

a) Clause boundary information,

b) Clausal head information

These clausal features are obtained using an intra-clausal (1st stage) parser. A 1

st stage parser

only parses individual clauses and attaches these clausal sub-trees to a artificial _ROOT_ (more

about the 1st stage parser in section 5.4.2). A 1

st stage parse is similar to the one that were

discussed in the previous chapter. Once such a parse is obtained the subtree span provides the

clause boundary information and the subtree head provides the head information.

65

5.3.4 Minimal semantics Features

The three types of features namely, morphological, morphosyntactic and clausal that we have

discussed until now captured different linguistic realities. So, for example, morph along with

chunk features help in identifying the verbal arguments and clausal feature constrains the search

space, thus helping in identifying in the right attachment site, etc. But for some sentence, none of

these features might prove to be helpful, sentence (5.3) is a case in point

(5.3) raama seba khaataa hai

‘Ram’ ‘apple’ ‘eat’ ‘is’

‘Ram eats apple’

In (5.3), both raama and seba have ø post-position and therefore there are no explicit

morphosyntactic cues that tell us the appropriate relation between raama and khaata hai and seba

and khaata hai. Compare this with (5.4), where seba is followed by a postposition that can help

us identify the object/theme of the event ‘eat’,

(5.4) raama seba ko khaataa hai

‘Ram’ ‘apple’ ‘ACC’ ‘eat’ ‘is’

‘Ram eats apple’

It should also be noted that in (5.3) the agreement features don’t help either. In (5.3) both the

nominals raama and seba are masculine. Neither does word order help, as Hindi is a free word

order language and therefore a sentence like seba raama khaataa hai also conveys the same

meaning. In such cases where surface based information fails, semantic features assist in

disambiguating the relations, thus aiding parsing. In (5.3), for example, the information that

raama is a ‘human/animate being‘ or that seba is an ‘inanimate being’ will prove to be crucial. In

fact, in (5.3), correct parsing is only possible if this semantic information is available. Such

semantic features capture the notion of yogyata discussed in chapter 2.

All semantic features don't contribute in identifying the dependency relations. Similarly, all

the dependency relations are not benefited by semantic features. So an optimal set of semantic

features should be used based on their positive contribution in dependency parsing. As a first step

in this direction, we use the following semantic features.

66

a) Human

b) Non-human

c) Inanimate

d) Time

e) Place

f) Abstract

g) Rest

These semantic features have been manually marked in the treebank. For this experiment we

do not get these features automatically, rather we use the gold features directly. In that sense, this

experiment is more illustrative than practical.

5.3.5 Results

Figure 5.1 shows the relative importance of these features over the baseline12

LAS of MSTParser.

It is clear that all the features discussed earlier improve the performance significantly. Similar

improvements have also been obtained for MaltParser. We relook at these results again in the

next chapter and make some observations. Note that in Figure 5.1, clausal and semantic features

presume the presence of L-Morph features.

Figure 5.1. Improvement of different features over MST baseline.

Morph: Morphological, L-Morph: Local Morphosyntax, Sem: Minimal Semantics

12 The Baseline is obtained using only the Basic Unigram and Bigram features (APPENDIX VI)

67

5.4 Linguistically constrained modularity

In this section we will explore different methods to use chunk and clause as minimal parsing units

during data-driven dependency parsing.

5.4.1 Chunk based parsing

As mentioned in the previous chapter, the notion of chunk can help distinguish local dependency

relations with global dependencies. We noticed that most of these local intra-chunk relations did

not affect the overall dependency structure. This lead us to modularize parsing into identifying

inter-chunk relations first and then to identify intra-chunk relations. In this section we illustrate

two methods to incorporate this modularity during data-driven dependency parsing.

5.4.1.1 Chunk as Hard Constraint (H-C)

In this method the dependency parser first marks relation between elements inside a chunk. The

inter-chunk relations are then identified forming the dependency tree for the sentence. During

inter-chunk parsing, intra-chunk context are used as features but are not modified. Hence in this

setup chunk in treated as a hard constraint. This is shown in figure 5.2.

The intra-chunk dependency relations are relatively easy to predict than inter-chunk relations.

In case of Hindi these intra-chunk dependency labels can be predicted from POS tags using a

small set of rules. The labeled attachment score of this system over the input data using gold

standard POS and chunk tags is 95.32%.

68

Figure 5.2. Chunk as Hard Constraint

5.4.1.2 Chunk as Soft Constraint (S-C)

In this method chunk information is used as features during parsing. This means that during

parsing both local (intra-chunk) and global (inter-chunk) relations are identified together. Hence

in this setup chunk in treated as a soft constraint. This is shown in Figure 5.3. We discussed this

earlier in section 5.3.2. The following chunk features were used

a) Type of the chunk

b) Head/non-head of the chunk

c) Chunk boundary information

d) Distance to the end of the chunk

e) Vibhakti computation for the head of the chunk

69

Figure 5.3. Chunk as Soft Constraint.

5.4.1.3 Results

Table 5.1 shows the results for both the methods and the baseline. The baseline setting does use

the notion of chunk, it only uses POS and morphological features. Note that unlike the other

results mentioned in this proposal, these results are for complete parse. Experiments were

conducted using 1165 sentences from the Hindi dependency treebank (Begum et al., 2008a).

Average length of these sentences is 17.4 words/sentence and 8.6 chunks/sentence. We trained

both the parsers on 1000 sentences and tested them on 165 sentences. These results are fleshed

out in the next section.

Malt MST + MaxEnt


Baseline 90.4 81.7 84.1 90.0 80.9 83.9

H-C 92.4 84.4 86.3 92.7 84.0 86.2

S-C 91.8 84.0 86.2 92.0 81.8 83.8

Table 5.1.Results for chunk modularity.

MST + MaxEnt: MST unlabelled trees with MaxEnt labeler

H-C: Chunks as Hard constraints, S-C: Chunks as Soft constraints

70

5.4.2 Clausal Parsing

Clause as a minimal parsing unit during parsing was motivated in chapter 3 using the notion of

tiganta (or verb vibhakti). In most cases, a clause demarcates the scope of a finite verb. Such a

definition of clause in data-driven parsing brings out some interesting correlations between inter-

and intra-clausal relations with relation type, depth, arc length and non-projectivity. Previous

work on Malt and MST have shown that these properties have direct repercussion on parser

accuracy. In this section we first correlate these properties with the notion of clause and then

explore two methods to incorporate it during the parsing process.

We first note that certain dependency relations are more likely to occur between the elements

inside a clause and a different set of relations are more likely for showing dependencies across

clauses. We also note that the notion of clause can be correlated with short distance and long

distance dependencies. Figure 5.4 shows the distribution of dependency labels with respect to

clause type (intra-clausal vs. inter-clausal) in the Hyderabad dependency treebank (Begum et al.,

2008a; Husain et al., 2010). For ease of exposition, figure 5.4 only shows the labels with

considerable coverage, together amounting to 93% of all dependency label occurrences. We can

see clearly that many labels like k1, r6, etc. are overwhelmingly intra-clausal relations, while

others like nmod-relc, ccof, etc. have an inter-clausal bias.

Figure 5.4. Dependency label distribution.

Figure 5.5 shows that short-distance dependencies are mostly intra-clausal, whereas long-distance

dependencies tend to be inter-clausal. It is clear from Figure 5.4 and 5.5 that there is a clear

correlation between labels and relation type on one hand and arc length and relation type on the

other. Further, there is a correlation between inter- vs. intra-clausal relations with respect to depth

of relations as well. Figure 5.6 shows that low depth dependencies are both inter-clausal (in case

71

of complex sentences involving coordination, relative clauses, embeddings, etc.) and intra-clausal

(simple sentences). It also shows that the percentage of inter-clausal relations decreases with

increase in depth.

Figure 5.5. Arc length and relation type

Figure 5.6. Depth and relation type

Finally, there is a correlation between clause and non-projectivity: 70% of the non-projective

relations are inter-clausal (Mannem et al., 2009a).

Properties such as relation type, arc length, depth, and non-projectivity are known to have

specific effect on errors in data-driven dependency parsing (McDonald and Nivre, 2007).

Therefore, it is worth exploring the effect of clause (when treated as a minimal unit) on

dependency parsing accuracy. We will see in this section that this amounts to parsing individual

clauses separately. As described in chapter 3, for all the experiments, the following definition of

clause is used:

72

‘A clause is a group of words containing a single finite verb and its dependents’.

More precisely, let T be the complete dependency tree of a sentence, and let G be a clausal

subgraph of T. Then an arc x → y in G is a valid arc, if (a) x is a finite verb; (b) y is not a finite

verb; (c) there is no z such that y → z, where z is a finite verb and y is a conjunct.

5.4.2.1 2-stage parsing

In section 5.3.3 we saw that clausal feature can be incorporated using features during parsing. In

this section we explore another method to bring in clausal information. The basic idea here is

essentially the same as constraint-based two-stage parsing discussed in chapter 4. We will first try

to parse intra-clausal relations and then parse inter-clausal relations. We will explore two ways of

doing this

a) Use the 1st stage output as something that is fixed, i.e. as Hard constraint. This is exactly

how 1st stage is treated in GH-CBP

b) Use the 1st stage output as Soft constraint. This means that the 1

st stage relations can in

principle be changed during 2nd

stage parsing.

Since the 1st stage parser aims to parse only clauses, the original treebank needs to be

modified to prepare the training data. We introduce a special dummy node named _ROOT_

which becomes the head of the sentence. All the clauses are connected to this dummy node with a

dummy relation. In effect we remove all the inter-clausal relations. The steps to do this

conversion are:

a) Add a dummy _ROOT_ node to the gold standard tree.

b) Find all sub-trees that have ccof13

or nmod__relc14

relation with its parent.

c) Find all subtrees where a relation exists between two VGF (finite verb chunk)

d) Attach those sub-trees and the respective parents to the new node _ROOT_, with a dummy

relation

The input to the 2nd

stage for all the method can be defined more precisely,

13 Conjunct relation (only finite clause are considered here) 14 noun modifier of the type relative clause

73

Let T be the complete tree that should be output by the 2nd

stage parser and let G be the

subgraph of T that is input to the second stage. Then G should satisfy the following constraint: if

the arc x → y is in G, then, for every z such that y → z is in T, y → z is also in G.

In other words, if an arc is included in the 1st stage partial parse, the complete subtree under the

dependent must also be included. Unless this constraint is satisfied, there are trees that the

second-stage parser (discussed in 5.4.2.2) cannot construct.

At the end of the conversion, the parses in the treebank are converted to ‘1st stage parses’, i.e.,

parse trees which one would get at the end of 1st stage. The settings while training MST/Malt for

stage 1 have already been discussed in Section 5.1.

(a) Original Gold input (b) 1st stage converted tree

Figure 5.7

5.4.2.2 2-stage parsing with hard constraints (2-Hard)

The partial parse obtained from 1st stage becomes the input to the 2nd

stage. In the 2nd

stage these

partial subtrees are related using appropriate relations. We can perform this two stage parsing in

two ways.

(a) By treating all the partial subtrees as a single node in the second stage, or

(b) By allowing the parser to accept partial trees as input.

5.4.2.2.1 Strategy I (2-Hard-S1)

For (a), the training data for 2nd

stage is obtained by converting the gold parse to suit 2nd stage

needs. Since we know that after 1st stage all the parsed clauses are attached to _ROOT_ (Figure

74

5.7b), we replace those sub-trees with their respective heads. We learn the relations between them

in the 2nd

stage. This is clearly depicted in Figure 5.8. The head of each sub-tree is a place holder

for the whole clause we parsed in 1st stage. Though we are only taking the head of each subtree,

we provide the features that are characteristic of the sub-tree, thereby reflecting the whole sub-

tree using that single node. This helps us to make the parser learn only the inter-clausal relations.

Figure 5.8. Stage2 training input. Partial trees converted into a single node.

Note here that we make an assumption that the inter-clausal relations exist only between the

heads (in most cases, a finite verb) of various clauses or in case of multiple conjuncts between

conjuncts themselves. There are exceptions to this (for example, relative clause construction).

This means that constructions like relative clause will have to handled separately.

Taking the head of a 1st stage parsed sub-tree as a single node that is representative of this

entire sub-tree requires using extra features while training for 2nd

stage. To do this, one must

judiciously isolate properties of the root as well as of subtree internal nodes. The features can be

broadly summarized as follows:

Structural features of the sub-tree, such as width, depth, branching, total no. of nodes, etc.,

and

The characteristics of the nodes (including the root) such as morph features, POS/Chunk tags,

arc label, domination (parent, grandparent, great-grandparent) feature, sibling features,

valency, etc.

75

For the present set of experiments we decided to use the same features15

as that of integrated

parsing. This has been done so as to make the comparison between the performances of 2-stage

parsing and integrated parsing unbiased. Figure 5.9 shows this setup.

Figure 5.9. Strategy I (2-Hard-S1). Input to 2nd

stage is a partial tree converted into a single node.

15

This will include POS/Chunk tags, morph features, suffix info., etc.

76

Figure 5.10. Strategy II (2-Hard-S2). Input to 2nd

stage is a partial parse

5.4.2.2.2 Strategy II (2-Hard-S2)

In this strategy, the 2nd

stage MaltParser takes as input partial 1st stage trees and establishes

relationships between clauses (and conjunctions) (Figure 5.10). The 1st stage predictions are

mutually exclusive of the 2nd

stage predictions and cannot be overridden in the 2nd

stage.

However, they can be used as features in the 2nd

stage predictions. This means that the 2nd

stage

MaltParser gets initialized with only those nodes that are attached to the _ROOT_ in the first

stage parse (cf. Figure 5.7b). Figure 5.11 below shows the initial configuration of 2nd

stage Malt

for sentence 5.5, the input will be the 1st stage parse shown in Figure 5.12a.

(5.5) mai ghar gayaa kyomki mai bimaar thaa

’I’ ’home’ ’went’ ’because’ ’I’ ’sick’ ‘was’

‘I went home because I was sick’

77

Figure 5.11. 2nd

stage initialization using the 1st stage parse shown in Fig. 4(a)

(a). 1st stage output (b). 2nd stage final parse

Figure 5.12. Parse output for sentence 5.5

The 1st stage and 2

nd stage parser will cater to different types of constructions. Recall the

constraint on the 2nd

stage input, we note that given such a constraint, a relative clause (though

being subordinate clause) cannot be handled in the 2nd

stage and will have to be handled

separately. We explain the problem of handling the relative clause in the 2nd

stage using sentence

(5.6).

(5.6) vaha vahaan waba puhuchaa jaba sab jaa chuke the

’He’ ’there’ ’when-REL’ ’reached’ ’when-COREL’ ’everyone’ ‘go’ ‘had’

‘He reached there when everyone had left’

78

Figure 5.13. Parse output and 2nd

stage initialization for sentence 5.6

Figure 5.13(a) shows the 1st stage output of a relative clause construction in a standard 2-stage

setup. Both the relative clause and the matrix clause are seen attached to the _ROOT_, the

analysis of these clauses is complete. In the 2nd

stage the relation between these two clauses is

established (Figure 5.13b). Recall that we initialize the 2nd

stage of 2-Hard-S2 with the children of

_ROOT_ which in this case is the finite verbs of the two clauses (Figure 5.13c). Now recall the

constraint on the input of the 2nd

stage in 2-Hard-S2; given this constraint the 2nd

stage can only

establish a relation between the two verbs and not, as is correct, between the relative clause verb

and a noun dependent on the matrix verb. The noun ‘waba’ is not present in the input buffer and

can never be considered as a head of ‘jaa’. Because of this reason, 2-Hard-S2 handles relative

clauses through a separate classifier after the 1st stage. This parse is then fed into the 2nd stage.

79

5.4.2.2.3 Handling relative clause constructions in 2-Hard-S2 and 2-Hard-S1

As discussed in the previous sections, both 2-Hard-S1 and 2-Hard-S2 are unable to parse relative

clause constructions. To handle such construction we use the following steps:

(a) Identify relative clauses at the end of 1st stage,

(b) Identify the head (noun) of the relative clause using Maximum Entropy (MaxEnt) model

(Ratnaparkhi. 1998),

(c) Attach the relative clause to its head

(d) Use this updated parse as input to 2nd

stage.

The identification of relative clauses at the end of 1st stage is rule based and depends on the

presence of lexical items such as jo, jaba, jisa, etc. These are relative pronouns. This system has

an accuracy of around 94%.

The MaxEnt model used to identify the correct head uses the following features.

a) Lexical Item of the NP's head,

b) POS Tag of the NP's head,

c) Direction of the NP from the relative clause (1, -1),

d) Distance of the NP from the relative clause verb (normalized to 4, 8, 12, 16, 20, 24),

e) Specific Cue: Presence of an item in the list ["taba","tyoM","vEsa","vaha","vahAM","vahAz"]

in either the lexical item or lemma of the word and its chunk members.

5.4.2.3 2-stage parsing with soft constraints (2-Soft)

We can, instead of treating the output of the first-stage parser as hard constraints for the 2nd

stage

parser, treat it as soft constraints by simply defining features over the arcs produced in the 1st

stage and making a complete parse in the 2nd

stage. Technically, this is the same technique that

Nivre and McDonald (2008) used to integrate Malt and MST, called guided parsing or parser

stacking. In 2-Soft we ‘guide’ Malt with a 1st stage parse by Malt. The additional features added

to the 2nd-stage parser during 2-Soft parsing encode the decisions by the 1st-stage parser

concerning potential arcs and labels considered by the 2nd

stage parser, in particular, arcs

involving the word currently on top of the stack and the word currently at the head of the input

buffer. For more details on the guide features for MaltParser, see Nivre and McDonald (2008).

80

Note again that, unlike the standard two-stage setup the 1st stage relations can now be overridden

during the 2nd

stage (because we are guiding), and unlike the standard guided parsing setup a

parser guides with only 1st stage relations. Unlike two-stage parsing, guided parsing parses

complete sentence twice. The results from one parser are used to extract features that guide the

second parser. In two-stage parsing, different components of a sentence are parsed in two stages.

5.4.2.4 Results

Table 5.2 shows the performance of the different parsers with 5-fold cross-validation on the Hindi

data described in section 5.2. 2-Hard-S216

and 2-Soft both perform better than the baseline. The

UAS, LAS and LA for the baseline were 88.82, 75.00 and 77.80 respectively. The difference with

the baseline and the two parsers were statistically significant. Significance is calculated using

McNemar’s test (p <= 0.05). These tests were made with MaltEval (Nilsson and Nivre, 2008).

UAS LAS LA

Baseline 88.82 75.02 77.80

2-Hard-S2 89.13 75.65 78.73

2-Soft 88.92 75.24 78.00

Table 5.2. Overall parsing accuracy (5-fold cross-validation)

Finer analysis of the cross-validation results shown in Table 5.2 brings out some interesting facts.

These are discussed in the next chapter.

5.5 Linguistically rich Graph-based parsing

In MSTParser (McDonald et al., 2005a,2005b, a graph based data driven parser), a complete

graph is used to extract a spanning tree during derivation. MSTParser’s learning model uses

large-margin algorithm, which optimizes the parameters of the model to maximize the score

margin between the correct dependency graph and all incorrect dependency graphs for every

sentence in a training set. The learning procedure is global. Unlike, MaltParser (and other

transition based systems, see Kubler et al., 2009) MSTParser considers limited history of parser

decisions during training. McDonald and Nivre (2007) characterize in detail the specific error

16

The feature file for the 2nd

stage parser in 2-Hard-S2 is given in APPENDIX V

81

patterns in MSTParser. Recent works such as Sagae and Lavie (2006), Nivre and McDonald

(2008), Zhang and Clark (2008), Koo and Collins (2010), have tried to improve the parsing

accuracy either by integrating the two parsers via stacking, etc. or by introducing better learning

models.

In this section we try to investigate if parsing accuracy using MSTParser can be improved by

providing it a constraint graph instead of a complete graph during the derivation step. While we

do not change the learning phase, it will be interesting to see what effect certain linguistic

knowledge alone can have on the overall accuracy. A constraint graph is formed by using

linguistic knowledge of a constraint based parsing system discussed in section 4.3. Through a

series of experiments we formulate the most optimal constraint graph that gives us the best

accuracy. These experiments show that some of the previous MSTParser errors can be corrected

consistently. It also shows the limitations of the proposed approach.

5.5.1 Constraint Graph

The constraint system discussed divides the task of parsing into intra-clausal and inter-clausal

stages. At each stage demand frames (mainly for verbs and conjunctions) for various heads are

used to construct a constraint graph. The parser currently uses close to 536 demand frames. The

constraint graph is then converted into an integer programming problem to get the parse at each

stage. Consider example (5.7)


’child’ ERG ‘banana’ ‘eat’ ‘and’ ‘sleep’ ‘went’

‘The child ate the banana and slept’

Figure 5.14 shows the 1st stage and the 2nd stage Constraint graph (CG) for (1). Note that the

arcs in 1st stage CG are localized to individual clauses. The _ROOT_ node is required in order to

get the partial parse at the end of the 1st stage. Also note that in the 2nd stage only the inter-

clausal relations are considered (here finite verbs and a conjunct).

The CG for each sentence provides the linguistic knowledge that we will use in various

experiments in this paper. We can use this information in two ways:

a) Complete CG (or individual stage CG) can be used during the derivation

82

b) CG (complete or stage specific) can be used selectively to prune out certain arcs in the

complete graph while retaining other.

We note that although CG also provides arc labels, for all our experiments we are only concerned

with the attachment information. This is because the spanning tree extraction algorithm in

MSTParser uses unlabeled graph. MSTParser uses a separate classifier to label the trees.

Figure 5.14. Constraint graph for sentence 5.7.

5.5.2 Experimental Setup

All the experiments are conducted on Hindi. We use the dependency treebank described in

section 5.2. MSTParser described in section 5.1 was modified so that it can use CG during

derivation. Experiments were first conducted using training and development data. Once the

experiments were frozen, only then the test data was used.

5.5.3 Experiments

For an input S = w0, w1, ….wn, i.e. the set of all words in a sentence, let GS be the complete graph

and CGS be the constraint graph provided by the constraint parser. Let N = {w0, w1, ….wn} be the

set of vertices in GS. AG = N x N and ACG ⊆ N x N are the set of arcs in the two graphs. An arc

83

between wi and wj, shown as (wi,wj) signifies wi as the parent of wj. Let X be the set of all the

nodes which occur as a child in ACG. Also let C be the set of all vertices which are conjuncts, V

be the set of all vertices which are verbs, K be the set of all vertices which are nouns, P be the set

of all vertices that have a case-marker/post-position and J be the set of adjectives.

The set of arcs which will be pruned from the complete graph in experiment 1 is shown in

table 5.3. This means that all the arcs in G will be pruned except the once present in CG.

For y in X:

For x in S:

If ∃! (x,y) in ACG:

Remove (x,y) from AG

Table 5.3. Experiment 1 valid arcs

The parser in experiment 1 (E1) outperformed the baseline accuracy (more details in section 5).

Further analysis showed that the pruning based on E1, although useful, also had some negative

effect, i.e. it also prunes out many potentially valid arcs that would have been originally

considered by MSTParser. Through experiments 2-8 we explore if we can minimize such invalid

pruning. We do this by systematically considering parts of the CG and using only those parts for

pruning G.

Experiment 2 (Table 5.4, 1st row) begins with focusing on child nodes with post-positions.

Also incorporated are the conjunction heads. Since a CG is formed based on explicit linguistic

cues, it makes sense to base our decision where concrete information is available. Experiment 3

(Table 5.4, 2nd

row) uses similar conditions, except the constraint of nodes with post-position is

only on noun children. By doing this we are trying to explore the most appropriate information in

the CG.

84

For y in X:

For x in S:



If ∃ (x,y) in ACG and y∉P and x∉C:

Remove(x,y) from AG

For y in X:

For x in S:



If ∃ (x,y) in ACG and y∈K and y∉P and x∉C:

Remove(x,y) from AG

Table 5.4. Experiment 2 and 3 valid arcs

For y in X:

For x in S:



If ∃ (x,y) in ACG and y∈K and y∉P and x∈V:

Remove(x,y) from AG

Table 5.5. Experiment 4 valid arcs

Experiment 4 is a constrained version of Experiment 3. It is interesting to note that in experiment

4 we are trying to prune out invalid arcs related to the argument structure of a verb (x∈V). Using

CG only for verbal arguments with case-marker captures various verbal alternations manifested

via case-markings.

85

For y in X:

For x in S:



If ∃ (x,y) in ACG and y∈K and y∉P and x∈V:

If ∃! (z,y) in ACG and (z∈C or z∈J)

Remove(x,y) from AG

For y in X:

For x in S:



If ∃ (x,y) in ACG and y∈K and y∉P and x∉C:

If ∃! (z,y) in ACG and (z∈C or z∈J)

Remove(x,y) from AG

Table 5.6. Experiment 5 and 6 valid arcs

Experiment 5 and 6 extends experiments 4 and 3 respectively by introducing an exception

where a noun child y with no case-marker is considered only if there exists other potential

conjunction/adjectival head for y. Owing to the free-word order property of Hindi, identifying the

head of a noun with no case-marker is a rather difficult task. In spite of their availability many

robust generalizations (that help disambiguate relations with nouns with no case-markings) such

as agreement remain unexploited during training (Ambati et al., 2010). In this experiment

therefore, we are trying to ensure that the ambiguity of correct heads for nouns with no post-

position is not resolved by CG.

Experiments 2-6 only catered to verbal, conjunction or adjectival head. Experiment 7 and 8

extend 5 and 6 to handle nominal predicate heads. We note here that this information is not

obtained from the CG and is being treated as a heuristic rather than having some linguistic

validity. The constraint parser has very limited coverage for nominal predicates and therefore we

cannot rely on it for this kind of information. The heuristic considers a possibility of an

attachment between two consecutive nouns and does not remove such arcs from the G.

86

5.5.4 Results

Figure 5.15 shows the results for all the experiment. The baseline UAS was 88.66 and the best

result was obtained from experiment 8 with the UAS of 89.31. This is an increase of 0.65%.

There was also an increase of 0.45% in the LAS. All the results were statistically significant

using McNemar’s test. We will discuss these results further in chapter 6.

Figure 5.15. Unlabeled attachment accuracies

87

Chapter 6

6. Rounding up

In this chapter we show the error analysis of all the approaches discussed in the previous

chapters. We will also make necessary observations wherever necessary.

6.1 GH-CBP

In chapter 4 we illustrated the GH-CBP framework and described how many notions from CPG

got incorporated as part of its design. These design decisions have repercussions on different

aspects of the parser. Below we briefly discuss some

a) Generic Framework: Any grammar driven parser negotiates the need to balance generic

grammatical notions with language variability. This in turn reflects on its overall

performance. In GH-CBP the distinction between language specific constraints (H-

constraints) and other constraints in the form of meta-constraints, eliminative and preferential

constraints helps it cater to language variability on the one hand and generic grammatical

notions on the other hand, in a controlled way.

b) Efficiency: GH-CBP uses modularity and localization to achieve efficiency. The notion of

chunks and clauses leading to a layered parsing helps limit the search space from exploding.

The use of licensing constraints to form a constraint graph (instead of a constraint network)

constrains the space of permissible dependency structures for a given sentence.

c) Robustness: GH-CBP always gives some parse. The provision of producing a partial parse

along with a failsafe mechanism helps in handling unknown constructions.

d) Ambiguity resolution: GH-CBP uses statistical prioritization technique to rank the output

parses. The basic idea of scoring and getting the weights is derived from graph-based parsing

and labeling techniques.

88

Figure 6.1. Some intra-clausal non-projective structures

[NP: Noun chunk, CCP: Conjunction chunk, VGF: Finite verb chunk,

NN[GEN]: Noun chunk with a genitive marker, VGNN: Verbal noun chunk]

e) Non-projective structures: IP formulation allows for handling non-projective parsing (Riedel

and Clarke, 2006). GH-CBP handles most of the non-projective structures in Hindi (Mannem

et al., 2009a). Some such constructions include: (a) Relative-Corelative constructions, (b)

Extraposed relative clause constructions, (c) Paired connectives, (d) Shared argument

splitting the non-finite clause, etc. However, there are some non-projective sentences such as

(a) and (c) shown in Figure 6.1. that can pose problems. But the relevant feature required to

identify the correct attachment site in such cases is many a times semantic in nature.

f) Handling complex structures: Complex linguistic cues (local and global) can easily be

encoded as part of various constraints. Relevant constraints can disambiguate contextually

similar entities by tapping in hard to learn features.

6.1.1 Errors

The performance of GH-CBP is affected due to the following reasons:

a) Small lexicon (linguistic demands of various heads): One of the main reasons for the low

LAS is that the total number of linguistic demand frames that the parser currently uses is very

less. These demand frames comprise of verbs, conjuncts and predicative adjectives. GH-CBP

currently uses only around 536 frames for Hindi (Begum et al., 2008b) and 460 frames for

Telugu. Considering this, the comparatively low LAS is not surprising. Table 6.1 shows the

total number of unseen verbs and most frequent errors. For unseen verbs a default strategy of

using a prototypical frame is used; not surprisingly, this does not always work well. This is

89

reflected in Table 6.2 where the most frequent errors are in argument structure of a verb such

as k1, k2, and k7. Related to this is the problem of correctly identifying complex predicates

(CP) (Noun-Verb, Adjective-Verb). Many generalizations like passivization, agreement, etc.

work differently for CPs; therefore, identifying them becomes crucial. Automatic

identification of such predicates is a challenging task, as most diagnostics present in the

literature are behavioral (Butt, 1995; Verma, 1993). The parser currently handles some CPs

that have honaa ‘be’ and karnaa ‘do’ as light verbs. There has been some initial work

towards automatically inducing verb frames for Hindi and Telugu (Kolachina et al., 2010)

and also towards automatic identification of complex predicates in Hindi (Venkatapathy et

al., 2005; Begum et al., 2011). These works can prove to be very useful in the future.

Total verbs 650

Seen 591

Unseen 59

Error Percentage

k1 9.09

k2 15.55

k7 9.86

Table 6.1. Unseen verbs and some common argument structure errors in Hindi test

Intra-clausal relations UAS LAS LA

Verb arguments 94.16 82.13 84.08

Non verb arguments 83.10 72.33 74.93

Table 6.2. Intra-clausal performance in Hindi

Table 6.2 shows the performance of intra-clausal relations. We see that when compared to verb

arguments, accuracies of non-verbal relations are less. This is because number of frames for

predicative adjectives and nominals are less.

b) Morphological errors and ambiguous TAMs: A small portion of errors are caused when the

morphological analyzer gets the root form of a verb wrong. In such a case, the parser will

pick incorrect verb frame. Also, in Hindi, certain TAM (tense, aspect and modality) labels are

ambiguous and will affect correct application of demand status transformation.

90

c) Unhandled constructions: The parser doesn’t handle some constructions very well and gets

them wrong frequently when they appear in a sentence. These constructions are:

a. Apposition

b. Long distance intra-clausal coordination

Some constructions such as missing copula in Bangla and Telugu, and missing genitive case

markers in Telugu are still not handled by the parser.

6.1.1.1 Error analysis of Prioritization

The average number of output parses for each sentence is around 50. It was noticed that the

differences between these parses were very minimal and this makes ranking them a non-trivial

task. The average attachment difference among various parses with respect to the oracle parse of

sentence was 0.69, similarly the average label difference was 1.65. The closeness between parses

is quite expected from a constraint based parser whose output parses are only those that do not

violate any Licensing and Eliminative constraints. In other words most of the output parses are

linguistically very sound. Of course, linguistic soundness is only restricted to morpho-syntax and

does not consider any semantics. This is because the H-constraints do not incorporate any

semantics in the parser as of now. Considering this, the error analysis doesn’t throw up any big

surprises. The main reasons why the LAS suffers during prioritization can be attributed to:

a) Lack of explicit post-positions or presence of ambiguous one: Errors because of this, manifest

themselves at different places. This can lead to attachment errors. Few common cases are

finite and non-finite argument sharing, confusion between finite and non-finite argument,

adjectival participle, etc. Also, it was noted that the most frequent errors are for those

arguments of the verb, that have no postposition. Consequently, relations such as ‘k1’, ‘k2’,

‘k7’ and ‘vmod’ have very high confusion.

b) Multiple best score parses: It is possible that more than one parse finally gets the best score.

Out of 321 Hindi test sentences, 58 sentences had multiple best score parses. This is partly

caused due the above reason but it also reflects the accuracy of the labeler. Table 6.3 shows

the accuracy of the MaxEnt classifier for detecting labels and attachments in Hindi and

Telugu. As the accuracy of the labeler increases this problem will lessen. Currently, we select

only the first parse amongst all the parses with equal score.

91

Accuracy

Hindi Telugu

Attachments 80.13 80.80

Labels 87.19 76.78

Table 6.3. MaxEnt performance for attachment and label identification

6.2 Data-driven Parsing

The advantages and disadvantages of the two data-driven parsers that were used for various

experiments have been extensively explored (McDonald and Nivre, 2007). Briefly, Malt is better

at identifying short distance dependencies while MST is good for long distance dependencies and

root. This pattern, as it turns out, is a direct repercussion of the algorithms used by these parsers.

The greedy deterministic Malt will make better use of local features while graph based MST will

make better use of the global features while training.

In sections 6.2.1 - 6.2.3 we discuss the results of the experiments described in the previous

chapter.

6.2.1 Use of targeted features in data-driven parsing

The four types of features that we discussed earlier in chapter 5 benefits both Malt and MST.

These features provide necessary information that lead to establishing different relations. Figure

6.2 shows the relative increase of these feature over the baseline of MSTParser.

We noted in chapter 5 that root, case and vibhakti get selected to give the best performance.

The selection of these morph features is quite obvious. Use of root helps in reducing sparcity,

whereas case and vibhakti provide important structural cues to identify relations. However,

gender, number and person in both Malt and MST led to decrease in accuracy. Agreement

patterns in ILs are not always straight-forward and such non-local patterns present in the language

are not being selected by the parsers while learning. This has also been noted for other languages,

such as Hebrew (Goldberg and Elhadad, 2009). Some recent parsing models such as Relational-

realizational parsing (Tsarfaty and Sima'an, 2008) have tried to overcome this problem in the

context of data driven parsing.

92

For all the experiment, the conjoined feature of parent and child vibhakti proved very

beneficial. Recall that there is a TAM-vibhakti mapping (more precisely, TAM-sup mapping) for

many dependency relations. The conjoined feature captures this mapping and therefore helps in

improving the overall performance. Incorporating this feature in Malt is pretty straightforward,

but for MST we modified the original code to get this feature working (APPENDIX VI). Table

6.4 shows the confusion matrix for some important labels. The confusion occurs mainly because

of absence of postposition or because of ambiguous postpositions or TAM. Other than the

conjoined features, features capturing the properties of a partially built tree in case of Malt proved

to be immensely useful.

Figure 6.2. Improvement of different features over MST baseline.

Morph: Morphological, L-Morph: Local Morphosyntax, Sem: Minimal Semantics

Note that clausal and semantic features presume the presence of L-Morph features.

93

Label Confusion

k1 k2 nmod pof k1s

270 232 158 104

k2 pof k1 k4 k2s

219 216 125 120

pof k2 k1 k1s

297 197 135

k7 k7p

363

k7p k7

209

main ccof nmod__relc vmod k2

178 150 94 86

r6 r6-k1 r6-k2

156 117

Table 6.4. Most frequent confusions.

We first experimented by giving only the clause inclusion (boundary) information to each node

(Table 6.5). This feature helps the parser reduce its search space during parsing decisions. Then,

we provided only the head and non-head information (whether that node is the head of the clause

or not). The head or non-head information helps in handling complex sentences that have more

than one clause and each verb in the sentence has its own argument structure. Surprisingly, we

did not achieve the best performance by using both boundary and head information. We suspect

that this is not the final word and further optimization might help in this case. The use of clause

boundary as features in MSTParser helps more with getting the attachments right than the label.

Figure 6.3 shows the effect of clausal features with respect to arc length. We can see that as the

arc length increases the effect of this feature becomes more pronounced.

94

MSTParser

UAS LAS

Clause Boundary 89.97 76.55

Clausal Head 89.27 75.35

Clause Boundary +

Clausal Head

89.47 75.92

Table 6.5. Effect of clausal features on parser performance.

Figure 6.3. Effect of clausal feature on arc length in MSTParser.

Minimal semantics as discussed earlier helps in capturing certain semantic selectional restriction

of a verb. We found that it helped in disambiguating mostly those relations which occur with null

postpositions. This is interesting because in such cases all the other features that have been

discussed earlier fail to work. Some such relations are shown in Table 6.6. Also notice that these

relations were also seen earlier in Table 6.4 showing that they are error prone.

95

MSTParser MaltParser

Baseline Min. Sem Baseline Min. Sem

k1 69.2 74.6 77.5 79.9

k2 52.2 59.6 62.9 64.1

pof 54.6 69.2 55.8 61.4

k7t 56.2 67.8 59.2 69.0

Table 6.6. Effect of minimal semantics on some relations.

6.2.2 Chunk Based Parsing

Chunk based parsing either as hard or soft constraints allows to make visible the notion of

vibhakti. For languages such as Hindi and Bangla this becomes crucial as the surface syntactic

cue necessary to identify certain relations are distributed. Note that it is the notion of vibhakti that

makes the use of conjoined features (which gave us big jump in accuracy) effective.

Use of chunk information either as hard or soft constraint has different advantages. Marking

relations only between chunk heads makes the parser ignore local details thereby making many

decisions easy. Some such patterns are

1. (NN)_NP (PSP NN PSP)_NP

2. (DEM NN PSP PRT)_NP

3. (NN RDP)_NP (VM VAUX)_VGF

4. (VM VAUX)_VGF (CC)_CCP … (VM)_VGF

Identification of chunk boundaries for most of these patterns is not difficult. In Table 6.7 we see

that for short distance dependencies using chunks as hard constraints gives better accuracy. As

the arc length increases however, using chunks as soft constraints seems like a better option. We

note that, use of chunk information either as hard or soft constraints helps improve the

performance over the baseline (cf. Table 5.1).

96

Malt MST + MaxEnt

H-C S-C H-C S-C

to_root 69.90 64.07 94.32 73.76

1 99.48 99.41 99.20 98.89

2 95.65 95.32 92.42 94.94

3 to 6 90.40 91.21 89.26 89.49

7 & above 81.72 83.73 84.55 85.47

Figure 6.7. Precision values for UAS with respect to arc length.

H-C: Chunk as hard constraint, S-C: Chunk as soft constraint

6.2.3 Clause Based Parsing

Results in table 5.2 show that the increase in LAS and LA (of 2-Hard-S2 and 2-Soft) is higher

than the their increase in UAS with respect to the baseline. In Table 6.8 we see that 2-Hard-S2

increase over Baseline is consistent across the board for both inter-clause and intra-clause LA and

LAS. These experiments show us that there is a clear pattern in cases where parsing benefits from

2-Hard-S2. These benefits are spread over both 1st stage and 2

nd stage. These cases are:

a) Better identification of some relations with deviant/ambiguous postpositions in the 1st

stage. For example, when ‘se’ appears for beneficiary/cause, instead of its default usage

for instrument. Table 6.10 shows the label identification for some frequent postpositions.

Notice that these postpositions are either 0 or ambiguous.

b) Improved handling of non-finite verbs in the 1st stage

c) Better handling of NULL nodes in the 2nd stage. Most NULL nodes are cases of ellipses

where a syntactic heads such as finite verb or a conjunct is missing. Most of these cases

fall into 2nd stage and are being better handled there.

d) Better handling of some 2nd stage specific constructions, e.g. clausal complements.

e) Improved handling of relative clauses. (cf. Table 6.9)

97

Intra-clausal

UAS

Intra-clausal

LAS

Inter-clausal

UAS

Inter-clausal

LAS

Baseline Malt 89.05 72.18 87.98 85.43

2-Hard-S2 88.87 72.36 90.1 87.71

Table 6.8. Accuracy for intra- and inter-clausal dependency relations.

LAS LA

Precision Recall Precision Recall

Baseline Malt 66.67 21.29 67.86 21.67

2-Hard-S2 38.79 34.22 83.19 73.38

Table 6.9. Accuracy for relative clause construction.

Closely related to the above points is the performance of 2-Hard-S2 and 2-Soft with respect

to arc length, depth and non-projectivity. It is known that Malt suffers on the dependencies closer

to the root (less depth) due to error-propagation. Also, Malt suffers at long distance dependencies

because of local feature optimization (McDonald and Nivre, 2007). In other words, for Malt,

depth and errors are negatively correlated while arc-distance and errors are positively correlated.

Postposition Baseline 2-Hard-S2

0

me

para

GEN

ko

se

Table 6.10. Label identification comparison between Baseline and 2-Hard-S2 for ambiguous

postpositions. signifies better performance. 0 and GEN signify null postposition and genitive

postpositions respectively.

Figure 6.4 shows the LAS of relations at various arc-lengths for 2-Hard-S2, 2-Soft and

Baseline. Note that as the arc-length increases the advantage of 2-Hard-S2 becomes more

pronounced. Figure 6.5 shows the performance of relations at different depths. By distinguishing

intra-clausal structures from inter-clausal structures, the 2-Hard-S2 setup is using shallower trees.

This effect is clearly seen in Figure 6.5, where for less depth 2-Hard-S2 outperforms Baseline.

98

Cases (c), (d) and (e) above reflect this. Cases (a) and (b) on the other hand show that 2-Hard-S2

also affects 1st stage performance by learning verbal arguments (both complements and adjuncts)

better. It is known that MaltParser has a rich feature representation but with increase in sentence

length its performance gets affected due error propagation. By treating a clause as a parsing unit

we reduce this error propagation as the features are being exploited properly.

It was found that 2-Hard-S2 did not help in reducing the non-projective relations. As both

Baseline and 2-Hard-S2 along with 2-Soft use the Arc-Eager parsing algorithm, they fare equally

badly in handling non-projectivity. There were some sentences where non-projectivity got

removed in the 1st stage, however the non-projective arc reappeared in the 2

nd stage, this

happened in the case of paired connective constructions (cf., Mannem et al., 2009a). We are yet

to investigate if non-projective parsing in the 2nd

stage might prove beneficial in such cases.

Figure 6.4. LAS at arc-length for Baseline, 2-Soft and 2-Hard-S2. The numbers above the bars

represent the % of relations at respective arc lengths.

99

Figure 6.5. LAS at depth for Baseline, 2-Soft and 2-Hard-S2. The numbers above the bars

represent the % of relations at respective depths.

In this section we pointed out some of the consistent trends that were noticed in our

experiments. Table 6.11 below shows some of the main advantages of the different data-driven

methods discussed in this section.

Method Benefits

Morphological features a) Captures the inflectional cues necessary for identifying

different relations

Clausal Features a) Helps in identifying the root of the tree

b) Helps in better handling of long distance relations

Minimal Semantics a) Captures semantic selectional restriction (needed precisely

when surface cues fail)

Chunk Parsing and

Local morphosyntax

a) Captures the notion of vibhkati

b) Helps capturing the postposition-TAM mapping

c) Helps in reducing attachment mistakes

Clausal Parsing a) Better learning of intra-clausal deviant relations

b) Better handling of participles

c) Better handling of long distance relations

Table 6.11. Advantages of different features/methods

100

6.2.4 Errors

Some of the main error sources during data-driven parsing were:

(a) Argument structure in simple sentences,

(b) Embedded clause construction (participles and relative clause),

(c) Coordination,

(d) Complex predicates.

6.2.5 Causes of Errors

The main causes of the errors discussed in the previous section are:

(a) Errors due to improper learning:

i. Label bias: This happens when a same feature is associated with two independent

classes (labels). This will lead to the parser identifying the label that is more frequent.

In some such cases, use of minimal semantic information helped the parser make the

correct decision. Clausal parsing helps in reducing some such errors by better learning

(cf. Table 6.10).

ii. Distributed features: One important instance of this would be agreement pattern. It was

noted in section 6.2.1 that out of all the morph features only root, case, and suffix

proves to be helpful. Other features such as gender, number and person that are useful

during agreement did not get selected during feature selection.

iii. Difficulty in making linguistic generalizations: That a verb will have a single mandatory

argument like k1 or k2 is not learnt properly. There are instances where a single verb

has more than one k1.

(b) Long distance dependencies: Like non-projectivity long distance dependencies can prove to

be a major source of errors. Clausal features and clausal parsing helped us reduce some such

errors.

(c) Nonprojectivity: Around 10% of all the arcs in the Hindi treebank are non-projective

(Mannem et al., 2009a). Most of these are inter-clausal and the use of clausal features was

helpful in indentifying some. But non-projectivity still remains a source of error.

(d) Genuine ambiguities: Correct decision in the case of certain sentences is difficult as there are

no precise cues that can be exploited. Some such instances are:

i. Lack of post-positions,

101

ii. Ambiguous post-positions,

iii. Adjectival participles attachment,

iv. Arguments of participles,

v. Appositions.

(e) Small corpus size: One potential reason for the low performance can be the small training

data. But, Hall et al. (2007) have shown that for many languages small data size is not a

crucial factor in ascertaining there parsing performance. The training data of 3000 sentences

is still small, and it is likely that many problems will reduce once the data increases.

6.3 Linguistically rich graph based parsing

Table 6.12 shows that the improvement in the accuracies is spread across different kinds of

relations. Both inter-clausal as well as intra-clausal relations benefit. Within a clause, both

argument structure of a verb and other relations are better identified in Experiment 8 (described

in 5.3.3) when compared to the baseline. There was a consistent improvement in the analysis of

certain phenomenons. These were:

a) Intra-clausal coordinating conjunction as dependents. These may appear either as arguments

of the verb or as children of non-verbal heads.

b) Better handling of arguments of non-finite verbs

c) Better handling of clausal arguments and relative clauses.

Relations Baseline Experiment 8

UAS LAS UAS LAS

Intra-clausal 87.86 73.63 88.30 74.00

Verbal Args*

(Intra-clausal)

89.46 72.28 90.02 72.68

Non-args

(Intra-clausal)

86.25 75.00 86.57 75.32

Inter-clausal 91.85 91.53 93.29 92.33

Table 6.12. Comparison of baseline and Experiment 8 accuracies. (*args: arguments)

A similar pattern is seen from Table 6.13 where there is an increase for almost all the POS tags

in the head attachment accuracy, except for adjectival attachments.

102

As mentioned earlier, the constraint graph is originally formed using the linguistic knowledge

of the constraint based system. It is clear that for our experiments the coverage of this knowledge

is very crucial. Our experiments show that while the coverage of verbal and conjunction heads is

good, knowledge of other heads such as predicative nouns and adjectives is lacking. As

mentioned earlier the constraint parser currently uses close to 536 demand frames. It would be

interesting to see how grammar extraction methods for Hindi (Kolachina et al., 2010) can be

combined with our approach to boost the knowledge base being currently used.

POS tag %Instance Accuracy

Baseline Experiment 8

Noun 64 89 90

Finite verb 15 94 95

Non-finite verb 3 83 83

Gerund 5 86 88

Conjunction 7 75 77

Adjective 4 98 95

Table 6.13. Accuracy distribution over POS

6.4 General observation

The performance for MaltParser with clausal modularity outperforms the parser that uses only the

local morphosyntactic features. The performance of linguistically rich MSTParser is better than

the one with clausal feature. In UAS, MSTParser with clausal feature is the highest. We note that

all these variants of MST and Malt outperform their baseline counterparts.

We find that both MSTParser and MaltParser outperform GH-CBP17

. It is clear that there is

still a lot of scope of improvement in GH-CBP. Its oracle score (GH-CBP’’) for LAS is

considerably higher than both MST and Malt and better ranking methods can reduce the gap

between them. These results show that in spite of the positive effects of important features,

linguistic modularity, etc. the overall performance for all the parsers is still low. This is

particularly true for LAS. The error analysis discussed in the previous sections clearly shows that

17

We note that it is not possible to directly compare the performance of GH-CBP with Malt and MST

currently because of the difference in the granularities of the dependency tagset. GH-CBP was tested on a

broad-grained tagset, where as the data-driven parsers were tested on a fine-grained tagset. Nevertheless,

the relative comparison still holds ground.

103

(a) ambiguous post-positions, and

(b) lack of post-positions

are amongst the main reasons for this. Table 6.4 shows this clearly, where labels such as k1, k2,

pof, k7t, etc. take the greatest hit in accuracy. In the previous chapter we mentioned that use of

minimal semantics can help cases such as (a) and (b). Table 6.6 illustrates that this is in fact the

case. It shows that these high confusion relations can benefit significantly from use of minimal

semantics. However, automatic identification of such semantic tags is not a trivial task and recent

attempts to do so for Hindi haven’t been very promising (Kosaraju et al., 2010). Knowledge of

verb class or the verb frame will also be something that can affect the overall accuracy

dramatically. Its effect on (linguistically rich) MSTParser is quite apparent. Constraint graph that

were used as input to MSTParser are formed, as we know, through such knowledge18

. Trying to

use such automatically induced grammatical knowledge in the future should be a productive

enterprise. There have been some recent attempts at, for example, automatic identification of

complex predicates and its positive effect on parsing accuracy (Begum et al., 2011). Such

knowledge is not only applicable to verbal heads but other non-verbal heads as well. We have

seen previously that GH-CBP currently has very little information about predicative nominal and

adjectival heads, so use of such frames should help.

Closely related to the task of grammar induction is automatic identification of the verb’s class

or frame. Currently GH-CBP uses all the available frames of a verb during derivation. If one can

correctly select the correct frame before derivation, we can consistently reduce the total number

of output parses. This will indirectly benefit prioritization. This task of frame/class selection can

be simplified if one ascertains only the valency of a head. In this formulation, this task becomes

similar to the task of supertagging (Bangalore and Joshi, 1999). Such information should also

help data driven parsing.

Overall the UAS for all the parsers is high. This shows that most of the languages structures

are been identified successfully. For data-driven parsers non-projective structures still remains a

challenge. Other major error sources are ambiguous constructions mentioned in section 6.2.5.

Perhaps increase in data will help resolve such ambiguities. The data-driven parsers currently do

not learn many linguistic generalizations such as agreement, single subject constraint, etc. Such

18

Note that constraint graph along with capturing the knowledge of demand frames also incorporates

clausal scope.

104

generalizations are frequent in some complex patterns that exists between verbal heads and their

children. Some recent work have been able to incorporate this successfully (Ambati, 2010).

105

Chapter 7

7. Conclusion

In this work we successfully incorporated various grammatical notion from Computation Paninan

Grammar (CPG) to build a generalized parsing framework. This was first done by using a

constraint based paradigm where elements from CPG lead to a layered parsing architecture. The

proposed generalized hybrid constraint based parsing system (GH-CBP) uses different types of

constraints to account for language variability and at the same time maintains its generic nature.

In particular, we described how grammatical notions such as, control, passives, gapping, verbal

alternations, agreement, subordination, coordination, etc. can be handled in GH-CBP for Indian

languages such as Hindi, Telugu and Bangla. We then integrated this setup with a graph-based

parsing/labeling inspired ranking strategy that prioritized the parses of the core constraint parser.

GH-CBP was evaluated for Hindi and Telugu and the results show good coverage of the parser.

In Chapter 5 we described ways in which the insights from GH-CBP can be used in data-

driven paring. This was successfully done using (a) linguistically motivated features, (b) using

linguistically constrained modularity, and (c) linguistically rich graph based parsing. These

experiments point out to various crucial factors that help in improving the parsing accuracy. In

particular, they show the importance of four types of linguistic features, namely, morphology

based, local morphosyntax, clausal and semantic based. The second set of experiments showed

that use of clausal and chunk information to modularize the parsing process helps. Using

linguistically rich graph based parsing we successfully used the knowledge of the constraint

parser in data-driven parsing. All these experiments led to statistically significant improvement

over the baseline systems. We finally discussed in detail the results of both constraint based and

data-driven parsing experiments and made explicit certain trends that flesh out the positives and

negatives in our work.

106

APPENDIX I: Dependency Tagset

No. Tag Name Tag description

1

k1

karta (doer/agent/subject)

Karta is defined as the 'most independent' of all the karakas

(participants).

2

k1s

vidheya karta (karta samanadhikarana)

Noun complements of karta

3

k2

karma (object/patient)

Karma is the locus of the result implied by the verb root.

4

k2p

Goal, Destination

The destination or goal is also taken as a karma. k2p is a subtype

of karma (k2). The goal or destination where the action of motion

ends is a k2p.

5

k2s

karma samanadhikarana (object complement)

The object complement is called as karma samanadhikarana.

6

k3

karana (instrument)

karana karaka denotes the instrument of an action expressed by a

verb root. The activity of karana helps in achieving the activity

of the main action.

7

k4

sampradaana (recipient)

Sampradana karaka is the recipient/beneficiary of an action. It is

the person/object for whom the karma is intended.

anubhava karta (Experiencer)

107

8 k4a

The experiencer/perceiver in perception verbs such as seems,

appear, etc..

9

k5

apaadaana (source)

apadana karaka indicates the source of the activity, i.e. the point

of departure. A noun denoting the point of separation for a verb

expressing an activity which involves movement away from is

apadana.

10

k7

vishayaadhikarana (location)

Location in time, place or abstract time.

11

r6

shashthi (possessive)

The genitive/possessive relation which holds between two nouns.

12

r6-k1,

r6-k2

karta or karma of a conjunct verb (complex predicate)

13

rh

hetu (cause-effect)

The reason or cause of an activity.

14

rt

taadarthya (purpose)

The purpose of an action.

15

nmod__relc,

jjmod__relc,

rbmod__relc

Relative clauses, jo-vo constructions

16

nmod

Noun modifier (including participles)

An underspecified relation employed to show general noun

modification without going into a finer type.

Verb modifier

108

17

vmod

Another underspecified tag. For some relations getting into finer

subtypes is not yet possible. Such relations are annotated with

slightly underspecified tag.

18 jjmod Modifiers of the adjectives

19

pof

Part of relation

Part of units such as conjunct verbs.

21

ccof

Conjunct of relation

Co-ordination and sub-ordination.

The above coarse-grained tagset was used in the ICON10 tools contest on IL depedency parsing

(Husain et al., 2010). For the complete tagset, see:

http://ltrc.iiit.ac.in/MachineTrans/research/tb/DS-guidelines/DS-guidelines-ver2-28-05-09.pdf

109

APPENDIX II: Chunk Tagset

No. Chunk Type Tag Name

1 Noun Chunk NP

2 Finite Verb Chunk VGF

3 Non-finite Verb Chunk VGNF

4 Infinitival Verb Chunk VGINF

5 Verb Chunk (Gerund) VGNN

6 Adjectival Chunk JJP

7 Adverb Chunk RBP

8 Chunk for Negatives NEGP

9 Conjuncts CCP

10 Chunk Fragments FRAGP

` Miscellaneous BLK

For complete description, see the guidelines:

http://ltrc.iiit.ac.in/MachineTrans/publications/technicalReports/tr031/posguidelines.pdf

110

APPENDIX III: POS Tagset

No. Category Tag Name

1 Noun NN

2 NLoc NST

3 Proper Noun NNP

4 Pronoun PRP

5 Demonstrative DEM

6 Verb-finite VM

7 Verb Aux VAUX

8 Adjective JJ

9 Adverb RB

10 Post position PSP

11 Particles RP

12 Conjuncts CC

13 Question Words WQ

14 Quantifiers QF

15 Cardinal QC

16 Ordinal QO

17 Classifier CL

18 Intensifier INTF

19 Interjection INJ

20 Negation NEG

21 Quotative UT

22 Sym SYM

23 Compounds *C

24 Reduplicative RDP

25 Echo ECH

26 Unknown UNK

It was decided that for foreign/unknown words that the POS tagger may give a tag “UNK”

For complete description, see the guidelines:

http://ltrc.iiit.ac.in/MachineTrans/publications/technicalReports/tr031/posguidelines.pdf

111

APPENDIX IV: MaltParser Features

<featuremodels>

<featuremodel name="nivreeager">

<feature>InputColumn(FORM, Stack[0])</feature>

<feature>InputColumn(FORM, Input[0])</feature>

<feature>InputColumn(POSTAG, Stack[0])</feature>

<feature>InputColumn(POSTAG, Input[0])</feature>





<feature>InputColumn(POSTAG, pred(Stack[0]))</feature>

<feature>InputColumn(POSTAG, head(Stack[0]))</feature>

<feature>InputColumn(POSTAG, ldep(Input[0]))</feature>

<feature>InputColumn(CPOSTAG, Stack[0])</feature>

<feature>InputColumn(CPOSTAG, Input[0])</feature>

<feature>InputColumn(CPOSTAG, Input[1])</feature>

<feature>InputColumn(CPOSTAG, ldep(Input[0]))</feature>

<feature>InputColumn(FORM, ldep(Input[0]))</feature>


<feature>InputColumn(LEMMA, Stack[0])</feature>

<feature>InputColumn(LEMMA, Input[0])</feature>


<feature>OutputColumn(DEPREL, rdep(Stack[0]))</feature>

<feature>OutputColumn(DEPREL, lsib(rdep(Stack[0])))</feature>

<feature>Split(InputColumn(FEATS, Stack[0]),\|)</feature>

<feature>Split(InputColumn(FEATS, Input[0]),\|)</feature>

<feature>Merge(InputColumn(POSTAG, Stack[0]), InputColumn(POSTAG,

Input[0]))</feature>

<feature>Merge(InputColumn(FEATS, Stack[0]),OutputColumn(DEPREL,

Stack[0]) )</feature>

112

</featuremodel>

</featuremodels>

113

APPENDIX V: MaltParser Features (2nd

stage of 2-Hard-S2)

<?xml version="1.0" encoding="UTF-8"?>

<featuremodels>

<featuremodel name="nivreeager">

<feature>InputColumn(FORM, Stack[0])</feature>







<feature>InputColumn(POSTAG, pred(Stack[0]))</feature>

<feature>InputColumn(CPOSTAG, Stack[0])</feature>


<feature>InputColumn(LEMMA, Stack[0])</feature>



<feature>OutputColumn(DEPREL, Stack[0])</feature>

<feature>OutputColumn(DEPREL, lsib(rdep(Stack[0])))</feature>

<feature>Merge(InputColumn(POSTAG, Stack[0]), InputColumn(FORM,

Stack[0]))</feature>

<feature>OutputColumn(DEPREL, ldep(Input[1]))</feature>

</featuremodel>

</featuremodels>

114

APPENDIX VI: MSTParser Features19

Basic Unigram Features

p-word, p-pos

p-word

p-pos

c-word, c-pos

c-word

c-pos

Basic Bigram Features

p-word, p-pos,c-word, c-pos

p-word, c-word, c-pos

p-pos, c-word, c-pos

p-word, p-pos, c-pos

p-word, p-pos, c-word

p-pos, c-pos

Basic Unigram Features + label

Basic Unigram Features + FEATS

Basic Unigram Features + p-FEATS

Basic Unigram Features + c-FEATS

Conjoined (Incorporated after modifying MSTParser)

Basic Unigram Features + FEATS + label

Basic Unigram Features + p-FEATS + label

Basic Unigram Features + c-FEATS + label

19 p-*: parent features

c-*: child features

FEATS: features in the FEATS column in the CoNLL format.

115

APPENDIX VII: MaxEnt Labeler Features

Dependency Tree Nodes

Current node

Parent node

Right-most left sibling

Left-most right sibling

Children

Features

Lexical item

Root form of the word

Part-of-speech tag

Coarse POS tag

Vibhakti markers

Direction of the dependency arc

Number of siblings

Number of children

Difference in positions of node and its parent

POS list from dependent to tree’s root through the dependency path

116

Bibliography

S. Abney. 1996. Part-of-Speech Tagging and Partial Parsing. In K. Church, S. Young, and G.

Bloothooft, editors, Corpus-Based Methods in Language and Speech. Kluwer Academic

Publishers.

S. Abney. Partial Parsing via Finite-State Cascades. 1997. Natural Language Engineering,

2(4):337–344.

J. Aissen. 1999. Markedness and subject choice in Optimality Theory. Natural Language and

Linguistic Theory 17:673–711.

B. R. Ambati. 2010. Importance of linguistic constraints in statistical dependency parsing. In

Proceedings of ACL 2010 Student Research Workshop (SRW), Uppsala, Sweden.

B. Ambati, S. Husain, J. Nivre and R. Sangal. 2010a. On the Role of Morphosyntactic Features in

Hindi Dependency Parsing. In Proceedings of NAACL-HLT 2010 workshop on Statistical

Parsing of Morphologically Rich Languages (SPMRL 2010), Los Angeles, CA.

B. Ambati, S. Husain, S. Jain, D. M. Sharma and R. Sangal. 2010b. Two methods to incorporate

'local morphosyntactic' features in Hindi dependency parsing. In Proceedings of NAACL-HLT

2010 workshop on Statistical Parsing of Morphologically Rich Languages (SPMRL 2010) Los

Angeles, CA.

B. Ambati, P. Gade, G.S.K. Chaitanya and S. Husain. 2009. Effect of Minimal Semantics on

Dependency Parsing. In RANLP09 student paper workshop.

G. Attardi and F. Dell’Orletta. 2008. Chunking and Dependency Parsing. LREC Workshop on

Partial Parsing: Between Chunking and Deep Parsing. Marrakech, Morocco.

S. Bangalore and A. K. Joshi. 1999. Supertagging: an approach to almost parsing. Computational

Linguistics.

117

R. Begum, K. Jindal, A. Jain, S. Husain and D. M. Sharma. 2011. Identification of Conjunct

Verbs in Hindi and Its Effect on Parsing Accuracy In Proceedings of the 12th CICLing, Tokyo,

Japan.

R. Begum, S. Husain, A. Dhwaj, D. M. Sharma, L. Bai and R. Sangal. 2008a. Dependency

Annotation Scheme for Indian Languages. In Proceedings of The Third International Joint

Conference on Natural Language Processing (IJCNLP). Hyderabad, India.

R. Begum, S. Husain, D. M. Sharma and L. Bai. 2008b. Developing Verb Frames in Hindi. In

Proceedings of The Sixth International Conference on Language Resources and Evaluation

(LREC). Marrakech, Morocco.

A. Bharati, S. Husain, D. M. Sharma and R. Sangal. 2009a. Two stage constraint based hybrid

approach to free word order language dependency parsing. In Proceedings of the 11th

International Conference on Parsing Technologies (IWPT). Paris.

A. Bharati, S. Husain, M. Vijay, K. Deepak, D. M. Sharma and R. Sangal. 2009b. Constraint

Based Hybrid Approach to Parsing Indian Languages. In Proceedings of the 23rd Pacific Asia

Conference on Language, Information and Computation (PACLIC 23). Hong Kong

A. Bharati, M. Gupta, V. Yadav, K. Gali, D.M. Sharma. 2009c. Simple Parser for Indian

Languages in a Dependency Framework. In Proc. of the Third Linguistic Annotation Workshop

at 47th ACL and 4th IJCNLP.

A. Bharati, D. M. Sharma, S. Husain, L. Bai, R. Begam and R. Sangal. 2009d. AnnCorra:

TreeBanks for Indian Languages, Guidelines for Annotating Hindi TreeBank.

http://ltrc.iiit.ac.in/MachineTrans/research/tb/DS-guidelines/DS-guidelines-ver2-28-05-09.pdf

A. Bharati, S. Husain, B. Ambati, S. Jain, D. M. Sharma and R. Sangal. 2008a. Two semantic

features make all the difference in Parsing accuracy. In Proceedings of the 6th International

Conference on Natural Language Processing (ICON-08), CDAC Pune, India.

A. Bharati, S. Husain, D. M. Sharma, and R. Sangal. 2008b. A Two-Stage Constraint Based

Dependency Parser for Free Word Order Languages. In Proceedings of the COLIPS

118

International Conference on Asian Language Processing 2008 (IALP). Chiang Mai, Thailand.

A. Bharati, D. M. Sharma, L. Bai and R. Sangal. 2006. AnnCorra: Annotating Corpora

Guidelines for POS and Chunk Annotation for Indian Languages. LTRC-TR31.

A. Bharati, R. Sangal and T. P. Reddy. 2002. A Constraint Based Parser Using Integer

Programming, In Proc. of ICON-2002.

A. Bharati, V. Chaitanya, R. Sangal. 1995a. Natural Language Processing: A Paninian

Perspective. Prentice-Hall of India, New Delhi.

A. Bharati, A. Gupta and R. Sangal. 1995b. Parsing with Nesting Constraints. Proc of 3rd NLP

Pacific Rim Symposium, Seoul, S. Korea.

A. Bharati and R. Sangal. 1993. Parsing Free Word Order Languages in the Paninian Framework.

In Proc. of ACL:93.

E. Black, F. Jelinek, J. D. Lafferty, D.M.Magerman, R. L.Mercer, and S. Roukos. 1992. Towards

history-based grammars: Using richer models for probabilistic parsing. In Proc. of the 5th

DARPA Speech and Natural Language Workshop, pages 31–37.

M. Butt. 1995. The Structure of Complex Predicates in Urdu. CSLI Publications.

J. Carroll. 2000. Statistical parsing. In R. Dale, H. Moisl, and H. Somers, (eds), Handbook of

Natural Language Processing, Marcel Dekker, pp. 525–543.

Y.J. Chu and T.H. Liu. 1965. On the shortest arborescence of a directed graph. Science Sinica,

14:1396–1400.

M. Collins. 2000. Discriminative reranking for natural language parsing. In Proc. of 7th ICML.

M. Collins and T. Koo. 2005. Discriminative reranking for natural language parsing. In CL p.25

70 March05.

119

B. Comrie. 1989. Language Universals and Linguistics Typology: Syntax and morphology.

Univerty of Chicago Press.

R. Debusmann, D. Duchier and G. Kruijff. 2004. Extensible dependency grammar: A new

methodology. Proceedings of the Workshop on Recent Advances in Dependency Grammar, pp.

78–85.

D. Duchier. 1999. Axiomatizing dependency parsing using set constraints. Proceedings of the 6th

Meeting on Mathematics of Language, Orlando, FL, pp. 115-126.

D. Duchier and R. Debusmann. 2001. Topological dependency trees: A constraint-based account

of linear precedence. Proc of 39th ACL and 10

th EACL. Toulouse, France, pp. 180-187.

J. Edmonds. 1967. Optimum branchings. Journal of Research of the National Bureau of

Standards, 71B:233–240.

J. Eisner. 1996. Three new probabilistic models for dependency parsing: An exploration,

Proceedings of the 16th COLING, Copenhagen, Denmark, pp. 340-345.

M.B. Emeneau. 1956. India as a linguistic area. Linguistics 32, 3–16

G. Eryigit, J. Nivre and K. Oflazer. 2008. Dependency Parsing of Turkish. Computational

Linguistics 34(3), 357-389.

K. A. Foth and W. Menzel. 2006. Hybrid parsing: Using probabilistic models as predictors for a

symbolic parser. In Proc. of COLING-ACL06.

P. Gadde, K. Jindal, S. Husain, D. M Sharma, and R. Sangal. 2010. Improving Data Driven

Dependency Parsing using Clausal Information. In Proceedings of NAACL-HLT 2010, Los

Angeles, CA. 2010.

Y. Goldberg and M. Elhadad. 2009. Hebrew Dependency Parsing: Initial Results. In Proceedings

of the 11th IWPT09. Paris. 2009.

120

J. Gorla, A. K. Singh, R. Sangal, K. Gali, S. Husain and S. Venkatapathy. 2008. A Graph Based

Method for Building Multilingual Weakly Supervised Dependency Parsers. In Proceedings of

the 6th International Conference on Natural Language Processing (GoTAL). Gothenburg,

Sweden. 2008.

M. Gupta, V. Yadav, S. Husain and D. M. Sharma. 2008. A Rule Based Approach for Automatic

Annotation of a Hindi TreeBank. In Proceedings of the 6th International Conference on

Natural Language Processing (ICON-08), CDAC Pune, India.

M. Gupta, V. Yadav, S. Husain and D. M. Sharma. 2010. Partial Parsing as a Method to Expedite

Dependency Annotation of a Hindi Treebank. In Proceedings of The 7th International

Conference on Language Resources and Evaluation (LREC). Valleta. Malta

J. Hajič, A. Böhmová, E. Hajičová and B. V. Hladká. 2000. The Prague Dependency Treebank: A

Three-Level Annotation Scenario. In A. Abeillé (ed.) Treebanks: Building and Using Parsed

Corpora, Amsterdam. Kluwer, 2000, pp. 103-127

E. Hajičová. 2002. Theoretical description of language as a basis of corpus annotation: The case

of Prague Dependency Treebank". In E. Hajičová, P. Sgall, J. Hana, T. Hoskovec (eds.):

Prague Linguistic Circle Papers, (4), Amsterdam/Philadelphia:John Benjamins, 2002, pp. 111

127

E. Hajicova. 1998. Prague Dependency Treebank: From Analytic to Tectogrammatical

Annotation. In Proc. TSD’98.

J. Hall, J. Nilsson, J. Nivre, G. Eryigit, B. Megyesi, M. Nilsson and M. Saers. 2007. Single Malt

or Blended? A Study in Multilingual Parser Optimization. In Proceedings of the CoNLL

Shared Task Session of EMNLP-CoNLL 2007, 933—939

M. P. Harper and R. A. Helzermann. 1995. Extensions to constraint dependency parsing for

spoken language processing. Computer Speech and Language 9: 187–234.

M. P. Harper, R. A. Helzermann, C. B. Zoltowski, B. Yeo, Y. Chan, T. Steward and P. L. Pellom.

1995. Implementation issues in the development of the PARSEC parser, Software: Practice

121

and Experience 25: 831-862.

Z. Harris. 1962. String analysis of sentence structure. Mouton.

P. Hellwig. 1986. Dependency Unification Grammar. Proc. of 11th COLING, Bonn Germany, pp.

195-198.

P. Hellwig. 2003. Dependency Unification Grammar. In V. Agel, L.M. Eichinger, H. W. Eroms,

P. Hellwig, H. J. Heringer and H. Lobin (eds), Dependency and Valency, Walter de Gruyter,

pp. 593-635.

R. Hudson. 1984. Word Grammar, Basil Blackwell, 108 Cowley Rd, Oxford, OX4 1JF, England.

R. Hudson. 1990. English Word Grammar, Basil Blackwell, 108 Cowley Rd, Oxford, England.

R. Hudson. 2007. Language Networks: The New Word Grammar. Oxford University Press.

S. Husain, P. Mannem, B. R. Ambati, and P. Gadde. 2010. The ICON-2010 Tools Contest on

Indian Language Dependency Parsing. In Proceedings of ICON-2010 Tools Contest on Indian

Language Dependency Parsing. Kharagpur, India.

S. Husain, P. Gadde, B. Ambati, D. M. Sharma and R. Sangal. 2009. A modular cascaded

approach to complete parsing. In Proceedings of the COLIPS International Conference on

Asian Language Processing 2009 (IALP). Singapore.

S. Husain. 2009. Dependency Parsers for Indian Languages. In Proceedings of ICON09 NLP

Tools Contest: Indian Language Dependency Parsing. Hyderabad, India. 2009.

T. Järvinen and P. Tapanainen. 1998. Towards and implementable dependency grammar.

Proceedings of the Workshop on Processing of Dependency-Based Grammars (ACL

COLING), Montreal, Canada, pp. 1-10.

A. Joshi and P. Hopely. 1999. A parser from antiquity: An early application of finite state

transducers to natural language parsing. In Kornai 1999.

122

F. Karlsson, 1990. Constraint grammar as a framework for parsing running text. Papers

Presented to the 13th International Conference on Conputational Linguisticss (COLING),

Helsinki, Finland, pp. 168-173.

F. Karlsson, A. Voutilainen, J. Heikkilä and A. Anttila, (eds). 1995. Constraint Grammar: A

language-independent system for parsing unrestricted text. Mouton de Gruyter.

P. Kiparsky and J. F. Staal. 1969. ‘Syntactic and Relations in Panini’, Foundations of Language

5, 84-117.

P. Kolachina, S. Kolachina, A. K. Singh, V. Naidu, S. Husain, R. Sangal and A. Bharati. 2010a.

Grammar Extraction from Treebanks for Hindi and Telugu. In Proceedings of The 7th

International Conference on Language Resources and Evaluation (LREC). Valleta. Malta.

2010.

T. Koo and M. Collins. 2010. Efficient Third-order Dependency Parsers. In Proc of ACL2010.

P. Kosaraju, S. R. Kesidi, V. B. R. Ainavolu and P. Kukkadapu. 2010. Experiments on Indian

Language Dependency Parsing. In Proc of ICON-2010 tools contest on Indian language

dependency parsing. Kharagpur, India.

BH. Krishnamurthi (ed). 1986. South Asian Languages: Structure, Convergence and Diglossia.

Motilal Banarasidass.

G. M. Kruijff. 2001. A Categorial Modal Architecture of Informativity: Dependency Grammar

Logic & Information Structure. Ph.D. thesis, Charles University, Prague, Czech Republic.

T. Kudo and Y. Matsumoto. 2002. Japanese dependency analysis using cascaded

chunking. In CoNLL-2002. pp. 63–69.

S. Kubler, R. McDonald and J. Nivre. 2009. Dependency parsing. Morgan and Claypool.

P. Mannem, H. Chaudhry and A. Bharati. 2009a. Insights into Non-projectivity in Hindi. In ACL

IJCNLP09 student paper workshop.

123

P. Mannem, A. Abhilash and A. Bharati. 2009b. LTAGspinal Treebank and Parser for Hindi.

Proceedings of International Conference on NLP, Hyderabad. 2009.

M. Marcus, B. Santorini, and M.A. Marcinkiewicz. 1993. Building a large annotated corpus of

English: The Penn Treebank, Computational Linguistics 1993.

A. Martins, N. Smith and E. Xing. 2009. Concise Integer Linear Programming Formulations for

Dependency Parsing. Proceedings of the ACL-IJCNLP09.

H. Maruyama. 1990. Structural disambiguation with constraint propagation. In Proceedings of

ACL:90.

C. P. Masica. 1993. The Indo-Aryan Languages. Cambridge University Press.

R. McDonald and J. Nivre. 2007. Characterizing the Errors of Data-Driven Dependency Parsing

Models. In Proc of Joint Conference on Empirical Methods in Natural Language Processing

and Computational Natural Language Learning

R. McDonald, K. Crammer, and F. Pereira. 2005a. Online large-margin training of dependency

parsers. In Proceedings of ACL 2005. pp. 91–98.

R. McDonald, F. Pereira, K. Ribarov, and J. Hajic. 2005b. Non-projective dependency parsing

using spanning tree algorithms. Proceedings of HLT/EMNLP, pp. 523–530.

I. A. Mel'čuk. Dependency Syntax: Theory and Practice, State University, Press of New York.

1988.

W. Menzel and I. Schröder. 1998. Decision Procedures for Dependency Parsing Using Graded

Constraints. In Proc of ACL. 1998.

J. Nilsson and J. Nivre. 2008. Malteval: An evaluation and visualization tool for dependency

parsing. In the Proc of Sixth International Language Resources and Evaluation, Marrakech,

Morocco.

124

J. Nivre. 2009. Non-Projective Dependency Parsing in Expected Linear Time. In Proc. of ACL

IJCNLP.

J. Nivre and R. McDonald. 2008. Integrating graph-based and transition-based dependency

parsers. In Proc. of ACL-HLT.

J. Nivre, J. Hall, J. Nilsson, A. Chanev, G. Eryigit, S. Kübler, S. Marinov and E Marsi. 2007a.

MaltParser: A language-independent system for data-driven dependency parsing. Natural

Language Engineering, 13(2), 95-135.

J. Nivre and J. Hall and S. Kubler and R. McDonald and J. Nilsson and S. Riedel and D. Yuret.

2007b. The CoNLL 2007 Shared Task on Dependency Parsing. In Proceedings of the CoNLL

Shared Task Session of EMNLP-CoNLL 2007.

J. Nivre. 2006. Inductive Dependency Parsing. Springer.

J. Nivre and J. Nilsson. 2005a. Pseudo-projective dependency parsing. In Proc. of ACL-2005,

pages 99–106.

J. Nivre, 2005b. Dependency Grammar and Dependency Parsing. MSI report 05133. Växjö

University: School of Mathematics and Systems Engineering.

J. Nivre. 2003. An efficient algorithm for projective dependency parsing. In the Proc of 8th

International Workshop on Parsing Technologies.

A. Prince, P. Smolensky. 1993. Optimality Theory: constraint interaction in generative grammar.

In Technical Report, Rutgers Center for Cognitive Science.

O. Rambow, B. Dorr, I. Kucerova and M. Palmer. 2003. Automatically Deriving

Tectogrammatical Labels from other resources- A comparison of Semantic labels across

frameworks. The Prague Bulletin of Mathematical Linguistics 79-80, 23–35 (2003)

S. Riedel and J. Clarke. 2006. Incremental integer linear programming for non-projective

dependency parsing. In Proc. EMNLP.

125

I. Schröder. 2002. Natural Language Parsing with Graded Constraints. PhD thesis, Hamburg

University

D. Seddah, M. Candito and B. Crabbé. 2009. Cross parser evaluation : a French Treebanks study.

In Proceedings of the 11th IWPT09. Paris. 2009.

P. Sgall, E. Hajicova, J. Panevova. 1986. The meaning of the Sentence in Its Pragmatic Aspects.

Reidel

C. Shastri. 1973. Vyakarana Chandrodya (Vol. 1to 5). Delhi: Motilal Banarsidass. (In Hindi)

L. Shen, A. Sarkar, A. K. Joshi. 2003. Using LTAG Based Features in Parse Reranking. In

Proc. of EMNLP 2003.

S. M. Shieber. Evidence against the context-freeness of natural language. 1985. In Linguistics and

Philosophy, p. 8, 334–343.

P. Shiuan and C. Ting Hian Ann. 1996.. A Divide-and-Conquer Strategy for Parsing. In Proc.

of IWPT.

S. Starosta. 1988. The Case for Lexicase: An Outline of Lexicase Grammatical Theory, Pinter

Publishers.

S. B. Steever. 1998. The Dravidian Languages. Routledge.

P. Tapanainen, and T. Järvinen. 1997. A non-projective dependency parser. Proceedings of the

5th Conference on Applied Natural Language Processing, pp. 64–71.

L. Tesnière. 1959. Eléments de Syntaxe Structurale. Klincksiek, Paris.

R. Tsarfaty, D. Seddah, Y. Goldberg, S. Kuebler, Y. Versley, M. Candito, J. Foster, I. Rehbein

and L. Tounsi. 2010. Statistical Parsing of Morphologically Ricj Languages (SPMRL) What,

How and Wither. In Proc of NAACL-HLT 2010 workshop on Statistical Parsing of

Morphologically Rich Languages (SPMRL 2010), Los Angeles, CA.

126

R. Tsarfaty and K. Sima'an. 2008. Relational-Realizational Parsing. Proceedings of the 22nd

CoLing. Manchester, UK.

A. Vaidya, S. Husain, P. Mannem, D. M. Sharma. 2009. A karaka-based dependency annotation

scheme for English. In Proceedings of the CICLing-2009, Mexico City, Mexico.

C. Vempaty, V. Naidu, S. Husain, R. Kiran, L. Bai, D. M Sharma, and R. Sangal 2010. Issues in

analyzing Telugu sentences towards building a Telugu Treebank. In Proceedings of CICLing

2010. Iai, Romania. 2010.

S. Venkatapathy, P. Agrawal and A. K. Joshi. 2005. Relative Compositionality of Noun+Verb

Multiword Expressions in Hindi. In Proceedings of ICON-2005, Kanpur, India

.M. K. Verma (ed.). 1993. Complex Predicates in South Asian Languages. Manohar Publications.

New Delhi.

H. Yamada and Y. Matsumoto. 2003. Statistical dependency analysis with support vector

machines. In Proc of the 8th IWPT, Nancy, France, pp. 195-206.

Y. Zhang and S. Clark. 2008. A tale of two parsers: Investigating and combining graph-based and

transition-based dependency parsing. In Proceedings of the Conference on Empirical Methods

in Natural Language Processing (EMNLP), pages 562-571.

A Generalized Parsing Framework Based On Computational...

Documents

Transcript of A Generalized Parsing Framework Based On Computational...