A Fast, Accurate Deterministic Parser for Chinese
-
Upload
roary-kirk -
Category
Documents
-
view
23 -
download
0
description
Transcript of A Fast, Accurate Deterministic Parser for Chinese
A Fast, Accurate Deterministic Parser for
ChineseMengqiuWang Kenji Sagae Teruko Mitamura
Language Technologies InstituteSchool of Computer ScienceCarnegie Mellon University
{mengqiu,sagae,teruko}@cs.cmu.edu
Advisor: Hsin-Hsi ChenSpeaker: Yong-Sheng Lo
Date: 2007/07/26
ACL - 2006
Introduction Deterministic parsing model
The parsing task the classification task The shift/reduce decision Four classifiers Feature selection
POS tagging Using gold-standard POS tags A simple POS tagger using an SVM classifier
Experiments Conclusion
Agenda
Word Segmentation
POS tagging
Parsing
Traditional statistical approaches To build models which assign probabilities to every possible
parse tree for a sentence Techniques
Such as dynamic programming, beam-search, and best-first-search are then employed to find the parse tree with the highest probability
Disadvantage Too slow for many practical applications
Introduction
Introduction Deterministic parsing model
The parsing task the classification task The shift/reduce decision Four classifiers Feature selection
POS tagging Using gold-standard POS tags A simple POS tagger using an SVM classifier
Experiments Conclusion
Deterministic Parsing Model Deterministic parsing model :
Input is a sentence has already been segmented and tagged with part-of-speech (POS) in
formation Data structure
Queue : To store the input word-POS tag pairs (ex.上海 -NR) Stack : To hold the partial trees that are built during parsing
At each parse state The classifier makes shift/reduce decision based on contextual featur
es Output is a full parsing tree
Introduction Deterministic parsing model
The parsing task the classification task The shift/reduce decision Four classifiers Feature selection
POS tagging Using gold-standard POS tags A simple POS tagger using an SVM classifier
Experiments Conclusion
The shift/reduce decision Four parsing actions : (Sagae and Lavie, 2005)
Shift To remove the first item on the queue and put it onto the stack
Reduce-Unary-X To remove one item from the stack X is the label of a new tree node that will be dominating the removed item
Reduce-Binary-X-Left To remove two item from the stack To take the head-node of the left sub-tree to be the head of the new tree
Reduce-Binary-X-Right To remove two item from the stack To take the head-node of the right sub-tree to be the head of the new tree
For example 1/5
Input : 布朗 (NR) 訪問 (VV) 上海 (NR)
The parse state : Initialization
Action : Shift
For example 2/5 The parse state : 2
Action : Reduce-Unary-NP
The parse state : 3
Action : Shift
For example 3/5 The parse state : 4
Action : Shift
The parse state : 5
Action : Reduce-Unary-NP
For example 4/5 The parse state : 6
Action : Reduce-Binary-VP-Left
The parse state : 7
Action : Reduce-Binary-IP-Right
For example 5/5 The parse state : final
Introduction Deterministic parsing model
The parsing task the classification task The shift/reduce decision Four classifiers Feature selection
POS tagging Using gold-standard POS tags A simple POS tagger using an SVM classifier
Experiments Conclusion
Four classifiers Support Vector Machine
The TinySVM toolkit (Kudo andMatsumoto,2000) Maximum-Entropy Classifier
The Le’s Maxent toolkit (Zhang, 2004) Decision Tree Classifier
The C4.5 decision tree classifier Memory-Based Learning
The TiMBL toolkit (Daelemans et al., 2004)
Introduction Deterministic parsing model
The parsing task the classification task The shift/reduce decision Four classifiers Feature selection
POS tagging Using gold-standard POS tags A simple POS tagger using an SVM classifier
Experiments Conclusion
Feature selection
Introduction Deterministic parsing model
The parsing task the classification task The shift/reduce decision Four classifiers Feature selection
POS tagging Using gold-standard POS tags A simple POS tagger using an SVM classifier
Experiments Conclusion
POS tagging1. Using gold-standard POS tags
2. A simple POS tagger using an SVM classifier Using gold-standard POS tags to train SVM Using a simple POS tagger Two passes
Pass 1 : To extract features from the two words and POS tags that came before the current word, the two words following the current word, and the current word itself
Then the tag is assigned to the word according to SVM classifier’s output
Pass 2 : Additional features such as the POS tags of the two words following the current word, and the POS tag of the current word (assigned in the first pass) are used
This tagger had a measured precision of 92.5% for sentences <= 40 words.
Word Segmentation
POS tagging
Parsing
Experiments Corpus
Penn Chinese Treebank Training : Sections 001-270 (3484 sentences, 84,873 words) Development : 271-300 (348 sentences, 7980 words) Testing : 271-300 (348 sentences, 7980 words) 99629 words
Evaluation Labeled recall (LR) Labeled precision (LP) F1 score (harmonic mean of LR and LP)
Experiments
Results of different classifiers On development set for sentence <= 40 words
Experiments Comparison with Related work
On the test set
Experiments Using gold-standard POS
Stacked classifier model
Using Maxent, DTree and TiNBL
outputs as features, in addition to the
original feature set, to train a new
SVM model on the original training set
Conclusion To present a novel classifier-based deterministic parser for Ch
inese constituency parsing The best model runs in linear time and has labeled precision a
nd recall above 88% using gold-standard part-of-speech tags The SVM parser is 2-13 times faster than state-of-the-art pars
ers, while producing more accurate results The Maxent and DTree parsers run at speeds 40-270 times fas
ter than state-of-the-art parsers, but with 5-6% losses in accuracy