A Lightweight and High Performance Monolingual Word Aligner
description
Transcript of A Lightweight and High Performance Monolingual Word Aligner
A Lightweight and High Performance Monolingual Word Aligner
Xuchen Yao, Benjamin Van Durme,(Johns Hopkins)
Chris Callison-Burch and Peter Clark (UPenn) (Vulcan)
2013-8-6 ACL 2013, Sofia 2
monolingual word alignment
• Aligning one sentence pair from RTE2
• Premise: Linda Johnson, who lives with her husband, Charles, and two cats in ... , said Katrina has ...
• Hypothesis: Linda Johnson is married to Charles
• alignment contributed by Brockett (2007)
2013-8-6 ACL 2013, Sofia 3
monolingual vs. bilingual aligment
• less training data (labeled or unlabeled), but more lexical resources
• semantic relatedness: cued by distributional word similaries
• the same grammar shared by source/target sentences
2013-8-6 ACL 2013, Sofia 4
monolingual vs. bilingual aligment
• less training data (labeled or unlabeled), but more lexical resources
• semantic relatedness: cued by distributional word similaries
• the same grammar shared by source/target sentences
2013-8-6 ACL 2013, Sofia 5
monolingual vs. bilingual aligment
• less training data (labeled or unlabeled), but more lexical resources
• semantic relatedness: cued by distributional word similaries
• the same grammar shared by source/target sentences
2013-8-6 ACL 2013, Sofia 6
a discriminative model
• first proposed by Blunsom and Cohn (2006):
• s, t: source (observation), target sentence• a: target word indices (0 to target length), state 0
is NULL state for deletion.• f(): feature functions
2013-8-6 ACL 2013, Sofia 7
a discriminative model
• first proposed by Blunsom and Cohn (2006):
• s, t: source (observation), target sentence• a: target word indices (0 to target length), state 0
is NULL state for deletion.• f(): feature functions
2013-8-6 ACL 2013, Sofia 8
a discriminative model
• first proposed by Blunsom and Cohn (2006):
• s, t: source (observation), target sentence• a: target word indices (0 to target length), state 0
is NULL state for deletion.• f(): feature functions
2013-8-6 ACL 2013, Sofia 9
2013-8-6 ACL 2013, Sofia 10
desired Viterbi decoding path
2013-8-6 ACL 2013, Sofia 11
a discriminative model
• first proposed by Blunsom and Cohn (2006):
• s, t: source (observation), target sentence• a: target word indices (0 to target length), state 0
is NULL state for deletion.• f(): feature functions
2013-8-6 ACL 2013, Sofia 12
features
• string similarity– Jaro Winkler, Dice Sorensen, Hamming, Jaccard,
Levenshtein, NGram overlapping and common prefix matching
• POS tags matching• WordNet
– hypernym, hyponym, synonym, derived form, entailing, causing, members of, have member, substances of, have substances, parts of, have part
2013-8-6 ACL 2013, Sofia 13
features
• string similarity– Jaro Winkler, Dice Sorensen, Hamming, Jaccard,
Levenshtein, NGram overlapping and common prefix matching
• POS tags matching• WordNet
– hypernym, hyponym, synonym, derived form, entailing, causing, members of, have member, substances of, have substances, parts of, have part
2013-8-6 ACL 2013, Sofia 14
features
• string similarity– Jaro Winkler, Dice Sorensen, Hamming, Jaccard,
Levenshtein, NGram overlapping and common prefix matching
• POS tags matching• WordNet
– hypernym, hyponym, synonym, derived form, entailing, causing, members of, have member, substances of, have substances, parts of, have part
2013-8-6 ACL 2013, Sofia 15
features
• positional– offset difference between src/tgt word
• context– whether neighboring words are similar– helps to align functional words
• distortion (Markov feature)– how far apart are two aligned target words
2013-8-6 ACL 2013, Sofia 16
features
• positional– offset difference between src/tgt word
• context– whether neighboring words are similar– helps to align functional words
• distortion (Markov feature)– how far apart are two aligned target words
2013-8-6 ACL 2013, Sofia 17
features
• positional– offset difference between src/tgt word
• context– whether neighboring words are similar– helps to align functional words
• distortion (Markov feature)– how far apart are two aligned target words
2013-8-6 ACL 2013, Sofia 18
Implementation: jacana-alignsource code at http://code.google.com/p/jacana
• lightweight: only used a POS tagger and WordNet
• written in Scala, optimize with LBFGS
• platform independent, compiles to a .jar file, fully interoperable with Java
• high performance? -> evaluation
2013-8-6 ACL 2013, Sofia 19
Baselines
• GIZA++• Tree Edit Distance (with stem/wordnet matching)• MANLI
– MacCartney, B.; Galley, M. & Manning, C. D., A Phrase-Based Alignment Model for Natural Language
Inference, EMNLP 2008
• MANLI-constraint (decoding with ILP)– Thadani, K. & McKeown, K. Optimal and syntactically-informed decoding for
monolingual phrase-based alignment. ACL 2011
2013-8-6 ACL 2013, Sofia 20
Baselines
• GIZA++• Tree Edit Distance (with stem/wordnet matching)• MANLI
– MacCartney, B.; Galley, M. & Manning, C. D., A Phrase-Based Alignment Model for Natural Language
Inference, EMNLP 2008
• MANLI-constraint (decoding with ILP)– Thadani, K. & McKeown, K. Optimal and syntactically-informed decoding for
monolingual phrase-based alignment. ACL 2011
2013-8-6 ACL 2013, Sofia 21
Baselines
• GIZA++• Tree Edit Distance (with stem/wordnet matching)• MANLI
– MacCartney, B.; Galley, M. & Manning, C. D., A Phrase-Based Alignment Model for Natural Language
Inference, EMNLP 2008
• MANLI-constraint (decoding with ILP)– Thadani, K. & McKeown, K. Optimal and syntactically-informed decoding for
monolingual phrase-based alignment. ACL 2011
2013-8-6 ACL 2013, Sofia 22
Baselines
• GIZA++• Tree Edit Distance (with stem/wordnet matching)• MANLI
– MacCartney, B.; Galley, M. & Manning, C. D., A Phrase-Based Alignment Model for Natural Language
Inference, EMNLP 2008
• MANLI-constraint (decoding with ILP)– Thadani, K. & McKeown, K. Optimal and syntactically-informed decoding for
monolingual phrase-based alignment. ACL 2011
2013-8-6 ACL 2013, Sofia 23
performance in F1
10.3%
2013-8-6 ACL 2013, Sofia 24
performance in F1
0.8%
3.3%
2013-8-6 ACL 2013, Sofia 25
performance in speed(seconds per sentecne)
• when sentences are more balanced, jacana-align is about 20x faster
corpus sentence pair length
MANLI-approx. MANLI-exact jacana-align
RTE2 29/11 1.67s 0.08s 0.025s
FUSION 27/27 61.96s 2.45s 0.096s20x 20x
2013-8-6 ACL 2013, Sofia 26
performance in speed(seconds per sentecne)
• the speed of jacana-align is not as sensitive to sentence length increase
corpus sentence pair length
MANLI-approx. MANLI-exact jacana-align
RTE2 29/11 1.67s 0.08s 0.025s
FUSION 27/27 61.96s 2.45s 0.096s30x 30x 4x
2013-8-6 ACL 2013, Sofia 27
Conclusion
• state-of-the-art monolingual word aligner– in accuracy– in speed
• open source, use it and hack it!
thank youwith a demo