Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department...
-
Upload
sullivan-goodhart -
Category
Documents
-
view
226 -
download
0
Transcript of Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department...
![Page 1: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/1.jpg)
Statistical NLPLecture 18: Bayesian grammar induction & machine translation
Roger Levy
Department of Linguistics, UCSD
Thanks to Percy Liang, Noah Smith, and Dan Klein for slides
![Page 2: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/2.jpg)
Plan
1. Recent developments in Bayesian unsupervised grammar induction
• Nonparametric grammars• Non-conjugate priors
2. A bit about machine translation
![Page 3: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/3.jpg)
Nonparametric grammars
• Motivation:• How many symbols should a grammar have?• Really an open question• “Let the data have a say”
![Page 4: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/4.jpg)
Hierarchical Dirichlet Process PCFG
• Start with the standard Bayesian picture:
(Liang et al., 2007)
![Page 5: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/5.jpg)
Grammar representation
• Liang et al. use Chomsky normal-form (CNF) grammars
• A CNF grammar has no -e productions, and only has rules of form• X ® Y Z [binary rewrite]• X ® a [unary terminal production]
not CNFCNF
![Page 6: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/6.jpg)
HDP-PCFG defined
• Each grammar has a top-level distribution over (non-terminal) symbols b• This distribution is a Dirichlet process (stick-breaking
distribution; Sethuraman, 1994)
• So really there are infinitely many nonterminals
• Each nonterminal symbol has:• an emission distribution• a binary rule distribution• and a distribution over what type of rule to use
![Page 7: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/7.jpg)
HDP-PCFG defined
![Page 8: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/8.jpg)
The prior over symbols
• The Dirichlet Process controls expectations about symbol distributions
![Page 9: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/9.jpg)
Binary rewrite rules
![Page 10: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/10.jpg)
Inference
• Variational Bayes• The tractable distribution is factored into data, top-level
symbol, and rewrite components
![Page 11: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/11.jpg)
Results
• Simple synthetic grammar (all rule probs equal):
• Successfully recovers sparse symbol structure
(standard ML-PCFG fails)
![Page 12: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/12.jpg)
Results on treebank parsing
• Binarize the Penn Treebank and erase category labels• Try to recover label structure, and then parse
sentences with the resulting grammar
ML estimation
![Page 13: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/13.jpg)
Dependency grammar induction & other priors
• We’ll now cover work by Noah Smith and colleagues on unsupervised dependency grammar induction
• Highlight on: non-conjugate priors• What types of priors are interesting to use?
![Page 14: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/14.jpg)
Klein & Manning dependency recap
![Page 15: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/15.jpg)
Klein and Manning’s DMV
• Probabilistic, unlexicalized dependency grammar over part-of-speech sequences, designed for unsupervised learning (Klein and Manning, 2004).
• Left and right arguments are independent; two states to handle valence.
$
Det
Nsing
Vpast
Prep Adj
Nsing .
![Page 16: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/16.jpg)
Aside: Visual Notation
T
G
Xt
Yt
maximized over
integrated out
observed
![Page 17: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/17.jpg)
EM for Maximum Likelihood Estimation
• E step: calculate exact posterior given current grammar
• M step: calculate best grammar, assuming current posterior
T
G
Xt
Yt
![Page 18: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/18.jpg)
Convenient Change of Variable
T
G
Xt
Yt
E
T
G
Xt
Ft,e
![Page 19: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/19.jpg)
E
EM (Algorithmic View)
• E step: calculate derivation event posteriors given grammar
• M step: calculate best grammar using event posteriors
T
G
Xt
Ft,e
![Page 20: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/20.jpg)
Maximum a Posteriori (MAP) Estimation
• The data are not the only source of information about the grammar.
• Robustness: the grammar should not have many zeroes. Smooth.
• This can be accomplished by putting a prior U on the grammar (Chen, 1995; Eisner, 2001, inter alia).
• The most computationally convenient prior is a Dirichlet, with α > 1.
![Page 21: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/21.jpg)
E
MAP EM (Algorithmic View)
• E step: calculate derivation event posteriors given grammar
• M step: calculate best grammar using event posteriors
T
G
Xt
Ft,e
U
![Page 22: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/22.jpg)
Experimental Results: EM and MAP EM
• Evaluation of learned grammar on a parsing task (unseen test data).
• Initialization and, for MAP, smoothing hyperparameter “u” need to be chosen.• Can do this with
unlabeled dev data (modulo infinite cross-ent),
• or labeled (shown in blue).
EM EM MAP MAP
German 40 20 54
English 23 42 42 42
Bulgarian 43 45 46
Mandarin 40 49 37 50
Turkish 32 42 41 48
Portuguese 43 43 37 42
Smith (2006, ch. 8)
![Page 23: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/23.jpg)
Structural Bias and Annealing
T
• Simple idea: use soft structural constraints to encourage structures that are more plausible.• This affects the E step only. The
final grammar takes the same form as usual.
• Here: “favor short dependencies.”
• Annealing: gradually shift this bias over time.
G
Xt
Yt B
U
![Page 24: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/24.jpg)
Algorithmic Issues
• Structural bias score for a tree needs to factor in such a way that dynamic programming algorithms are still efficient. • Equivalently, g and b, taken together, factor into local
features.
• Idea explored here: string distance between a word and its parent is penalized geometrically.
![Page 25: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/25.jpg)
Experimental Results: Structural Bias & Annealing
• Labeled dev data used to pick• Initialization• Hyperparameter• Structural bias strength
(for SB)• Annealing schedule (for
SA)
MAP CE SA
German 54 63 72
English 42 58 67
Bulgarian 46 41 59
Mandarin 50 41 58
Turkish 48 59 62
Portuguese 42 72 51
Smith (2006, ch. 8)
![Page 26: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/26.jpg)
Correlating Grammar Events
• Observation by Blei and Lafferty (2006), regarding topic models:• A multinomial over states that gives high probability to
some states is likely to give high probability to other, correlated states.
• For us: a class that favors one type of dependents is likely to favor similar types of dependents.• If Vpast favors Nsing as a subject, it might also favor
Nplural.• In general, certain classes are likely to have correlated
child distributions.
• Can we build a grammar-prior that encodes (and learns) these tendencies?
![Page 27: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/27.jpg)
Logistic Normal Distribution over Multinomials
• Given: mean vector μ, covariance matrix Σ• Draw a vector η from Normal(η; μ, Σ).• Apply softmax:
![Page 28: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/28.jpg)
Logistic Normal Distributions
p1 = p2 =
0.5
p1→ 1
p2→ 1
p1 = 1 p1 = 0
m = [ ]0.40.6
η
softmax
![Page 29: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/29.jpg)
Logistic Normal Distributions
μ, Σ
p1→ 1
p2→ 1
![Page 30: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/30.jpg)
Logistic Normal Grammar
...η1
η2
η3
ηn
![Page 31: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/31.jpg)
softmax
softmax
softmax
softmax
Logistic Normal Grammar
![Page 32: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/32.jpg)
softmax
softmax
softmax
softmax
Logistic Normal Grammar
g
![Page 33: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/33.jpg)
Logistic Normal Grammar
g
![Page 34: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/34.jpg)
Learning a Logistic Normal Grammar
• We use variational EM as before to achieve Empirical Bayes; the result is a learned μ and Σ corresponding to each multinomial distribution in the grammar.• Variational model for G also has a logistic normal form.• Cohen et al. (2009) exploit tricks from Blei and Lafferty
(2006), as well as the dynamic programming trick for trees/derivation events used previously.
![Page 35: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/35.jpg)
Experimental Results: EB
• Single initializer.• MAP hyperparameter
value is fixed at 1.1.• LN covariance matrix is
1 on the diagonal and 0.5 for tag pairs within the same “family” (thirteen, designed to be language-independent).
EM MAP EB (D)
EB (LN)
English 46 46 46 59
Mandarin 38 38 38 47
Cohen, Gimpel, and Smith (NIPS 2008)Cohen and Smith (NAACL-HLT 2009)
![Page 36: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/36.jpg)
Shared Logistic Normals
• Logistic normal softly ties grammar event probabilities within the same distribution.
• What about across distributions?• If Vpast is likely to have a noun argument, so is Vpresent.• In general, certain classes are likely to have correlated
parent distributions.
• We can capture this by combining draws from logistic normal distributions.
![Page 37: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/37.jpg)
Shared Logistic Normal Distributions
...η1
η2
η3
ηn
![Page 38: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/38.jpg)
Shared Logistic Normal Distributions
...η1
η2
η3
ηn
![Page 39: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/39.jpg)
Shared Logistic Normal Distributions
![Page 40: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/40.jpg)
Shared Logistic Normal Distributions
![Page 41: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/41.jpg)
average & softmax
average & softmax
average & softmax
average & softmax
Shared Logistic Normal Distributions
![Page 42: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/42.jpg)
average & softmax
average & softmax
average & softmax
average & softmax
Shared Logistic Normal Distributions
g
![Page 43: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/43.jpg)
Shared Logistic Normal Distributions
g
![Page 44: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/44.jpg)
What to Tie?
• All verb tags share components for all six distributions (left children, right children, and stopping in each direction in each state).
• All noun tags share components for all six distributions (left children, right children, and stopping in each direction in each state).
• (Clearly, many more ideas to try!)
![Page 45: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/45.jpg)
Experimental Results: EB
• Single initializer.• MAP hyperparameter
value is fixed at 1.1.• Tag families used for
logistic normal and shared logistic normal models.
• Verb-as-parent distributions, noun-as-parent distributions each tied in shared logistic normal models.
EM MAP EB (LN)
EB (SLN)
English 46 46 59 61
Mandarin 38 38 47 49
Cohen and Smith (NAACL-HLT 2009)
![Page 46: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/46.jpg)
Bayesian grammar induction summary
• This is an exciting (though technical and computationally complex) area!
• Nonparametric models’ ability to scale model complexity with data complexity is attractive
• Since likelihood clearly won’t guide us to the right grammars, exploring a wider variety of priors is also attractive
• Open issue: nonparametric models constrain what types of priors can be used
![Page 47: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/47.jpg)
Machine translation
• Shifting gears…
![Page 48: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/48.jpg)
Machine Translation: Examples
![Page 49: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/49.jpg)
Machine Translation
Madame la présidente, votre présidence de cette institution a été marquante.Mrs Fontaine, your presidency of this institution has been outstanding.Madam President, president of this house has been discoveries. Madam President, your presidency of this institution has been impressive.
Je vais maintenant m'exprimer brièvement en irlandais.I shall now speak briefly in Irish .I will now speak briefly in Ireland . I will now speak briefly in Irish .
Nous trouvons en vous un président tel que nous le souhaitions.We think that you are the type of president that we want.We are in you a president as the wanted. We are in you a president as we the wanted.
![Page 50: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/50.jpg)
History
• 1950’s: Intensive research activity in MT• 1960’s: Direct word-for-word replacement• 1966 (ALPAC): NRC Report on MT
• Conclusion: MT no longer worthy of serious scientific investigation.
• 1966-1975: `Recovery period’• 1975-1985: Resurgence (Europe, Japan)• 1985-present: Gradual Resurgence (US)
http://ourworld.compuserve.com/homepages/WJHutchins/MTS-93.htm
![Page 51: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/51.jpg)
Levels of Transfer
Interlingua
SemanticStructure
SemanticStructure
SyntacticStructure
SyntacticStructure
WordStructure
WordStructure
Source Text Target Text
SemanticComposition
SemanticDecomposition
SemanticAnalysis
SemanticGeneration
SyntacticAnalysis
SyntacticGeneration
MorphologicalAnalysis
MorphologicalGeneration
SemanticTransfer
SyntacticTransfer
Direct
(Vauquois triangle)
![Page 52: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/52.jpg)
General Approaches
• Rule-based approaches• Expert system-like rewrite systems• Interlingua methods (analyze and generate)• Lexicons come from humans• Can be very fast, and can accumulate a lot of knowledge over time
(e.g. Systran)
• Statistical approaches• Word-to-word translation• Phrase-based translation• Syntax-based translation (tree-to-tree, tree-to-string)• Trained on parallel corpora• Usually noisy-channel (at least in spirit)
![Page 53: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/53.jpg)
The Coding View
• “One naturally wonders if the problem of translation could conceivably be treated as a problem in cryptography. When I look at an article in Russian, I say: ‘This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode.’ ”
• Warren Weaver (1955:18, quoting a letter he wrote in 1947)
![Page 54: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/54.jpg)
MT System Components
sourceP(e)
e f
decoder
observed
argmax P(e|f) = argmax P(f|e)P(e)e e
e fbest
channelP(f|e)
Language Model Translation Model
Finds an English translation which is both fluent and semantically faithful to the French source
![Page 55: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/55.jpg)
Overview: Extracting Phrases
Sentence-aligned corpus
cat ||| chat ||| 0.9 the cat ||| le chat ||| 0.8dog ||| chien ||| 0.8 house ||| maison ||| 0.6 my house ||| ma maison ||| 0.9language ||| langue ||| 0.9 …
Phrase table(translation model)
Intersected and grown word alignments
Directional word alignments
![Page 56: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/56.jpg)
Phrase-Based Decoding
这 7 人 中包括 来自 法国 和 俄罗斯 的 宇航 员 .
![Page 57: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/57.jpg)
Why Syntactic Translation?
Kare ha ongaku wo kiku no ga daisuki desu
From Yamada and Knight (2001)
He adores listening to music.
![Page 58: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/58.jpg)
Two Places for Syntax?
• Language Model• Can use with any translation model• Syntactic language models seem to be better for MT
than ASR (why?)• Not thoroughly investigated [Charniak et al 03]
• Translation Model• Can use any language model• Linear LM can complement a tree-based TM (why?)• Also not thoroughly explored, but much more work
recently
![Page 59: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/59.jpg)
Parse Tree (E) Sentence (J)
.
Reorder
VB
PRP VB2 VB1
TO VB
MN TO
he adores
listening
music to
Insert
desu
VB
PRP VB2 VB1
TO VB
MN TO
he ha
music to
ga
adores
listening no
Translate
desu
VB
PRP VB2 VB1
TO VB
MN TO
kare ha
ongaku wo
ga
daisuki
kiku no
VB
PRP VB1
he adores
listening
VB2
VB TO
MNTO
musicto
Parse Tree(E)
Sentence(J)
![Page 60: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/60.jpg)
VB
PRP VB1 VB2
VB TO
TO MN
PRP VB2 VB1
TO VB
VB
NN TO
he adores
listening
to music
headores
listening
music to
P(PRP VB1 VB2 PRP VB2 VB1 ) = 0.723P(VB TO TO VB ) = 0.749P(TO NN NN TO ) = 0.893
1. Reorder
![Page 61: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/61.jpg)
Parameter Table: Reorder
Original Order Reordering P(reorder|original)
PRP VB1 VB2 PRP VB1 VB2 PRP VB2 VB1 VB1 PRP VB2 VB1 VB2 PRP VB2 PRP VB1 VB2 VB1 PRP
0.074 0.723 0.061 0.037 0.083 0.021
VB TO VB TO TO VB
0.107 0.893
TO NN TO NN NN TO
0.251 0.749
![Page 62: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/62.jpg)
2. Insert
VB
PRP VB2 VB1
TO VB
NN TO
music to
he ha ga
nolistening
adores desu
P(none|TOP-VB) = 0.735
P(right|VB-PRP)* P(ha) = 0.652 * 0.219
P(right|VB-VB) * P (ga) = 0.252 * 0.062
P(none|TO-TO) = 0.900Conditioning Feature = Parent Label & Node Label (position) none (word selection)
![Page 63: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/63.jpg)
Parameter Table: Insert
Parent label node level
TOP VB
VB VB
VB TO
TO TO
TO NN
TO NN
P (none) P (left) P (right)
0.735 0.004 0.260
0.687 0.061 0.252
0.344 0.004 0.652
0.700 0.030 0.261
0.900 0.003 0.097
0.800 0.096 0.104
W P (insert-w) ha ta wo no ni te ga desu
0.219 0.131 0.099 0.094 0.080 0.078 0.062 0.0007
![Page 64: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/64.jpg)
3. Translate
VB
PRP VB2 VB1
he ha TO VB ga adores desu
kare
NN TO
music to
listening no
kiku
daisukiP (he kare) = 0.952P (music ongaku) =0.900P (to wo ) = 0.038P (listening kiku ) = 0.333P (adore daisuki) = 1.000Conditioning Feature= word (E) identity
ongaku wo
![Page 65: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/65.jpg)
Parameter Table: Translate
E adores he listening music to
J daisuki 1.000 kare 0.952 NULL 0.016 nani 0.005 da 0.003 shi 0.003
kiku 0.333 kii 0.333 mi 0.333
ongaku 0.900 naru 0.100
ni 0.216 NULL 0.204 to 0.133 no 0.046 wo 0.038
Note: Translation to NULL = deletion
![Page 66: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/66.jpg)
Synchronous Grammars
• Multi-dimensional PCFGs (Wu 95, Melamed 04)• Both texts share the same parse tree:
![Page 67: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/67.jpg)
Synchronous Grammars
• Formally: have paired expansions
• … with probabilities, of course!• Distribution over tree pairs• Strong assumption: constituents in one language are
constituents in the other• Is this a good assumption? Why / why not?
S NP VP
S NP VP
VP V NP
VP NP V
![Page 68: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/68.jpg)
Synchronous Derivations
![Page 69: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/69.jpg)
Synchronous Derivations (II)
![Page 70: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/70.jpg)
Hiero Phrases
![Page 71: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/71.jpg)
S
NP VP
PRO
he
VBZ
enjoys
NP
VBG
listening
VP
P
to
NP
SBAR
music
Original input: Transformation:
S
NP VP
PRO
he
VBZ
enjoys
NP
VBG
listening
VP
P
to
NP
SBAR
music
Top-Down Tree Transducers
[Next slides from Kevin Knight]
![Page 72: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/72.jpg)
S
NP VP
PRO
he
VBZ
enjoys
NP
VBG
listening
VP
P
to
NP
SBAR
music
Original input: Transformation:
S
NP VP
PRO
he
VBZ
enjoys
NP
VBG
listening
VP
P
to
NP
SBAR
music
Top-Down Tree Transducers
![Page 73: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/73.jpg)
S
NP VP
PRO
he
VBZ
enjoys
NP
VBG
listening
VP
P
to
NP
SBAR
music
Original input: Transformation:
NP
PRO
he
VBZ
enjoys
NP
VBG
listening
VP
P
to
NP
SBAR
music
, ,
Top-Down Tree Transducers
, wa ,ga
![Page 74: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/74.jpg)
S
NP VP
PRO
he
VBZ
enjoys
NP
VBG
listening
VP
P
to
NP
SBAR
music
Original input: Transformation:
VBZ
enjoys
NP
VBG
listening
VP
P
to
NP
SBAR
music
,kare wa,
Top-Down Tree Transducers
, ,ga
![Page 75: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/75.jpg)
S
NP VP
PRO
he
VBZ
enjoys
NP
VBG
listening
VP
P
to
NP
SBAR
music
kare kikuongaku owa daisuki desugano
Original input: Final output:
, , , , , , ,,
Top-Down Tree Transducers
![Page 76: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/76.jpg)
S
NP VP
PRO
he
VBZ
enjoys
NP
VBG
listening
VP
P
to
NP
SBAR
music
kare kikuongaku owa daisuki desugano
Original input:
, , , , , , ,,
Top-Down Tree Transducers
A
x0:B C
x0, F, x2, G, x1
x1:D x2:E
![Page 77: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/77.jpg)
这 7 人 中包括 来自 法国 和 俄罗斯 的 宇航 员 .
RULE 1:DT(these) 这
RULE 2:VBP(include) 中包括
RULE 6:NNP(Russia) 俄罗斯
RULE 4:NNP(France) 法国
RULE 8:NP(NNS(astronauts)) 宇航 , 员
RULE 5:CC(and) 和
RULE 10:NP(x0:DT, CD(7), NNS(people) x0 , 7 人
RULE 13:NP(x0:NNP, x1:CC, x2:NNP) x0 , x1 , x2
RULE 15:S(x0:NP, x1:VP, x2:PUNC) x0 , x1 , x2
RULE 16:NP(x0:NP, x1:VP) x1 , 的 , x0
RULE 9:PUNC(.) .
RULE 11:VP(VBG(coming), PP(IN(from), x0:NP)) 来自 , x0
RULE 14:VP(x0:VBP, x1:NP) x0 , x1
“These 7 people include astronauts coming from France and Russia”
Derivation Tree
“France and Russia”
“coming from France and Russia”
“astronauts coming fromFrance and Russia”
“these 7 people”
“include astronauts coming fromFrance and Russia”
“these” “Russia” “astronauts” “.”“include” “France” “&”
![Page 78: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/78.jpg)
Examples
![Page 79: Statistical NLP Lecture 18: Bayesian grammar induction & machine translation Roger Levy Department of Linguistics, UCSD Thanks to Percy Liang, Noah Smith,](https://reader034.fdocuments.net/reader034/viewer/2022052307/5517fc6c550346c6568b502e/html5/thumbnails/79.jpg)
Examples