Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too...
Transcript of Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too...
![Page 1: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/1.jpg)
Speech, NLP and the Web
Pushpak BhattacharyyaCSE Dept., IIT Bombay
Lecture 7,9, 10: Theoretical Underpinnings-Maximum Likelihood and Maximum Entropy
Principles(lecture 8 was on NLTK by Abhijit)
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 1
![Page 2: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/2.jpg)
Fundamental principles of machine learning
Learning in vacuum is impossible-importance of prior knowledge
Inductive Bias: What too learn, in what form to learn are pre-decided
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 2
![Page 3: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/3.jpg)
Structure learning and parameter learning
Structure- parts and their relationships
Parameter- probabilities
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 3
![Page 4: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/4.jpg)
Example (1/2): transition table
^
NN
NV
.
^ N V O .
^ 0 0.6 0.2 0.2 0
N 0 0.1 0.4 0.3 0.2
V 0 0.3 0.1 0.3 0.3
O 0 0.3 0.2 0.3 0.2
. 1 0 0 0 0
This transition table will change from language to language due to language divergences.
Partial sequence graph
![Page 5: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/5.jpg)
Example (2/2): Lexical Probability Table
Size of this table = # pos tags in tagset X vocabulary size
vocabulary size = # unique words in corpus
Є people laugh ... …
^ 1 0 0 ... 0
N 0 1x10-3 1x10-5 ... ...
V 0 1x10-6 1x10-3 ... ...
O 0 0 1x10-9 ... ...
. 1 0 0 0 0
![Page 6: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/6.jpg)
Structure and parameter
N people N (1 x 10-3 ) x 0.1 N laugh N (1 x 10-5 ) x 0.1 N people V (1 x 10-3 ) x 0.4 N laugh V (1 x 10-5 ) x 0.4 …
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 6
![Page 7: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/7.jpg)
PCFG rules (structure + parameter)
S NP VP 1.0 NP DT NN 0.5 NP NNS 0.3 NP NP PP 0.2 PP P NP 1.0 VP VP PP 0.6 VP VBD NP 0.4
• DT the 1.0• NN gunman 0.5• NN building 0.5• VBD sprayed 1.0• NNS bullets 1.0• P with 1.0
29 July, 2014 Pushpak Bhattacharyya: Parsing 7
![Page 8: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/8.jpg)
Expectation Maximization
One of the key ideas of Statistical AI, ML, NLP, CV
Iterative procedure Find Parameters Find hidden variables Maiximize data likelihood
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 8
![Page 9: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/9.jpg)
The coin tossing problem
Case of 1 coin: Suppose there are N tosses of a coin. NH = The number of Heads What is the probability of a head i.e. PH = ?
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 9
![Page 10: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/10.jpg)
Observed variable
#Observation = N
N
xPTherefore
otherwiseheadaproducestossthewhenxwhere
xxxx
N
ii
H
i
N
1
321
,
,0,1
,, :X
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 10
![Page 11: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/11.jpg)
Each observation is a Bernoulli’s Trial where
is the probability of success i.e., getting a head
is the probability of failure i.e., getting a tail
HP
HP1
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 11
![Page 12: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/12.jpg)
Likelihood of X
• Likelihood of X, i.e., probability of Observation Sequence X is:
Each trial is identical and independent. Maximum Likelihood of data, requires
us to make and thus, get
the expression for PH
ii x-1H
N
1i
xHH )P -(1P )PL(X,
0HdP
dL
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 12
![Page 13: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/13.jpg)
Mathematical Convenience
Take log of the likelihood.
Differentiating w.r.t. PH
To get the expression for , make
N
iHiHi PxPxXLL
1
)1log()1(log);(
H
iN
i H
i
Px
Px
dHdLL
1
11
HP 0HdP
dLL
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 13
![Page 14: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/14.jpg)
Equating to 0, expression for PH
H
N
ii
H
N
ii
H P
x
PNx
P
111 1
1
N
xP
N
ii
H
1
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 14
![Page 15: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/15.jpg)
Maximum Entropy
Suppose we do not know how to get the MLE, or the likelihood expression is impossible to get, then we use: Maximum Entropy. Example: In problems like co-reference
resolution.
Entropy= To be elaborated later.
)1log()1(log HHHH PPPP
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 15
![Page 16: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/16.jpg)
Case for Expectation Maximization Instead of one coin we toss two coins.
Parameters <P, P1, P2> P = Probability of choosing first coin P1 = Probability of choosing head from first
coin P2 = Probability of choosing head from second
coin
We do not know which coin the observation came from
NxxxxX ,.....,,: 321
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 16
![Page 17: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/17.jpg)
EM continued..
Z1, Z2, Z3,…, ZN is the hidden sequence running alongside X1, X2, X3,…, XN
Where, Zi =1, if the ith observation came from coin 1, =0, otherwise
21
321
,,,....,,,
),,Pr();Pr(
PPPzzzzZ
ZXX
N
Z
NN zxzxzxzxY ,......,,,: 332211
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 17
![Page 18: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/18.jpg)
Cntd.
We want to work with
Invoke convexity/concavity and expectation of Zi and work with log(Pr(Y;θ))
N
i
zxxzxx iiiiii PPPPPP
ZXPY
1
1122
111 ))1.().1((*))1.(.(
);,();Pr(
));,(log();( Z
ZXPXLL
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 18
![Page 19: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/19.jpg)
N
iiii PxPxPzEXLL
1
11 ))1log()1(log)(log([);(
))]1log()1(log)1))(log((1( 22 PxPxPziE ii
Log Likelihood of the Data
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 19
![Page 20: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/20.jpg)
IMPORTANT POINTS TO NOTE
Log moves inside the product term. Σ disappears giving rise to E(Zi) in place
of Zi
Differentiate wrt p, p1, p2, equate to 0 and get the results
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 20
![Page 21: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/21.jpg)
P, P1, P2
)()(
1
11
i
Ni
iiNi
zExzE
p
)()(
1
12
i
Ni
iiNi
zENxzEM
p
M= observed no. of heads
NzE
p iNi )(1
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 21
)1(22
)1(11
)1(11
)1()1()1()1(
)(/)1|().1()|1()(
iiii
ii
xxxx
xxiiiiii
i
PPPPPPPPP
xxPzxxPzPxxzPzE
![Page 22: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/22.jpg)
Another application of EM
WSD
Mitesh Khapra, Salil Joshi and PushpakBhattacharyya, It takes two to Tango: A Bilingual Unsupervised Approach for estimating Sense Distributions using Expectation Maximization, 5th International Joint Conference on Natural Language Processing (IJCNLP 2011), Chiang Mai, Thailand, November 2011.
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 22
![Page 23: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/23.jpg)
Definition: WSD
Given a context: Get “meaning” of
a set of words (targeted wsd) or all words (all words wsd)
The “Meaning” is usually given by the id of senses in a sense repository usually the wordnet
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 23
![Page 24: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/24.jpg)
Example: “operation” (from Princeton Wordnet) Operation, surgery, surgical operation, surgical procedure, surgical
process -- (a medical procedure involving an incision with instruments; performed to repair damage or arrest disease in a living body; "they will schedule the operation as soon as an operating room is available"; "he died while undergoing surgery") TOPIC->(noun) surgery#1
Operation, military operation -- (activity by a military or naval force (as a maneuver or campaign); "it was a joint operation of the navy and air force") TOPIC->(noun) military#1, armed forces#1, armed services#1, military machine#1, war machine#1
mathematical process, mathematical operation, operation --((mathematics) calculation by mathematical methods; "the problems at the end of the chapter demonstrated the mathematical processes involved in the derivation"; "they were learning the basic operations of arithmetic") TOPIC->(noun) mathematics#1, math#1, maths#1
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 24
![Page 25: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/25.jpg)
Hindi Wordnet
Dravidian Language Wordnet
North East Language Wordnet
Marathi Wordnet
Sanskrit Wordnet
EnglishWordnet
Bengali Wordnet
Punjabi Wordnet
KonkaniWordnet
UrduWordnet
WSD for ALL Indian languages: Critical resource: INDOWORDNET
Gujarati Wordnet
Oriya Wordnet
Kashmiri Wordnet
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 25
![Page 26: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/26.jpg)
Synset Based Multilingual Dictionary
Expansion approach for creating wordnets [Mohanty et. al., 2008]
Instead of creating from scratch link to the synsets of existing wordnet
Relations get borrowed from existing wordnet
S1
S3 S4
S6
S5
S7
S2
S1
S3 S4
S6
S5
S7
S2
S1
S3 S4
S6
S5
S7
S2 A sample entry from the MultiDict
Hindi Marathi
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 26
![Page 27: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/27.jpg)
Hypothesis
Sense distributions across languages is invariant!! Proportion of times a sense appears in a
language is uniform across languages!
E.g., proportion of times the sense of “sun” appears in any language through “sun” and its synonyms remains the same!
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 27
![Page 28: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/28.jpg)
ESTIMATING SENSE DISTRIBUTIONS
If sense tagged Marathi corpus were available, we could have estimated
But such a corpus is not available
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 28
![Page 29: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/29.jpg)
EM for estimating sense distributions
‘
Problem: ‘galaa’ itself is ambiguous Its raw count cannot be used as it
is
Solution: Its count should be weighted by
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 29
![Page 30: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/30.jpg)
Word correspondencesSense inEnglish
Smar
(Marathisensenumber)
wordsmar
(partial list)Shin=π(Smar)(projectedHindi sensenumber)
wordsmar
(partial listof words inprojectedHindisense)
Neck 1 maan, greeva 1 gardan, galaa
Respect 2 maan,satkaar,sanmaan
3 izzat, aadar
Voice 3 awaaz, swar 2 galaa
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 30
![Page 31: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/31.jpg)
EM for estimating sense distributions
‘
M-Step
E-Step
)().#|()().#|()().#|()().#|()().#|()().#|(
)|(
1111
11
1
swarswarSPawaajawaajSPgreevagreevaSPmaanmaanSPgreevagreevaSPmaanmaanSP
galaSP
marmarmarmar
marmar
hin
)().#|()().#|()().#|()().#|()().#|()().#|(
)|(
2211
11
1
izzatizzatSPaadaraadarSPgalagalaSPgardangardanSPgalagalaSPgardangardanSP
maanSP
hinhinhinhin
hinhin
mar
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 31
![Page 32: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/32.jpg)
General Algo
stepExxSP
vvSPuSP L
jLSxS
LiL
SvLi
LjL
Lj
LiL
)2()()).#|((
)().#|)(()|(
1
21
21
1
21
21
)(
)(
)(
)3()()).#|((
)().#|)(()|(
1
2
2
2
12
22
2
11
22
)(
)(
LiL
Lk
LmL
SyS
LkL
SvLk
SSwhere
stepMyySP
vvSPvSP
LmL
Lm
LiL
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 32
![Page 33: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/33.jpg)
Results Algorithm
MarathiP % R % F %
IWSD (training onself corpora; noparameterprojection) 81.29 80.42 80.85
IWSD (training onHindi and projectingparameters forMarathi) 73.45 70.33 71.86
EM (no sensecorpora in eitherHindi or Marathi) 68.57 67.93 68.25
Wordnet Baseline 58.07 58.07 58.07
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 33
![Page 34: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/34.jpg)
Results & Discussions
Performance of projection using manual cross linkages is within 7% of Self-Training
Performance of projection using probabilistic cross linkages is within 10-12% of Self-Training – remarkable since no additional cost incurred in target language
Both MCL and PCL give 10-14% improvement over Wordnet First Sense Baseline
Not prudent to stick to knowledge based and unsupervised approaches –they come nowhere close to MCL or PCL
Manual Cross LinkagesProbabilistic Cross LinkagesSkyline - self training data is available
Wordnet first sense baseline
S-O-T-A Knowledge Based ApproachS-O-T-A Unsupervised Approach
Our values
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 34
![Page 35: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/35.jpg)
Convexity
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 35
![Page 36: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/36.jpg)
Motivation: argmaxcomputation
Statistical Spell Checking Automatic Speech Recognition Part of Speech Tagging Probabilistic Parsing Statistical Machine Translation
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 36
![Page 37: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/37.jpg)
Some general observations
A*= argmax [P(A|B)]A
= argmax [P(A).P(B|A)]A
Computing and using P(A) and P(B|A), both need(i) looking at the internal structures of A and B(ii) making independence assumptions(iii) putting together a computation from smaller parts
![Page 38: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/38.jpg)
Problem 1: Spell checker: apply Bayes Rule
W*= argmax [P(W|T)]= argmax [P(W).P(T|W)]
W=correct word, T=misspelt word Why apply Bayes rule?
Finding p(w|t) vs. p(t|w) ? Assumptions :
t is obtained from w by a single error. The words consist of only alphabets(Jurafsky and Martin, Speech and NLP, 2000)
![Page 39: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/39.jpg)
Problem-2: Isolated word recognition
Problem Definition : Given a sequence of speech signals, identify the words.
2 steps : Segmentation (Word Boundary Detection) Identify the word
Isolated Word Recognition : Identify W given SS (speech signal)
^arg max ( | )
WW P W SS
![Page 40: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/40.jpg)
Problem-3: Statistical MT “Find the English translation e corresponding to a
given Foreign sentence f”
Thus, we seek ebest such that
ebest = argmaxe P(e |f ) = argmaxe [P(e) * P(f |e)]
Language Model – P(e)Translation Model – P(f |e)
Translations are produced on the basis of statistical model
Parameters are estimated using bilingual parallel corpora
![Page 41: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/41.jpg)
Convexity: utility
Jensen’s inequality
Kullback–Leibler distance/divergence
EM formulation
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 41
![Page 42: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/42.jpg)
)( 1xf
)( 2xf
)()1()( 21 xfxf
))1(( 21 xxf
21 )1( xxz
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 42
![Page 43: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/43.jpg)
Criteria for convexity
A function f(x) is said to be convex inthe interval [a,b] iff
)()1()())1(( 2121 xfxfxxf
],[,
21
21
baxxxx
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 43
![Page 44: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/44.jpg)
Jensen’s inequality
For any convex function f(x)
n
iii
n
iii xfxf
11)()(
Where 11
n
ii and 10, ii
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 44
![Page 45: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/45.jpg)
Proof of Jensen´s inequality
Method:- By induction on N Base case:-
ally truef(x),trivif(x)λλ
λf(x)x)f(λN
i
11 where.
1
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 45
![Page 46: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/46.jpg)
Another base case
N = 2
convex is f(x) since )()1()(1 since ))1((
)(
2111
212111
2211
xfxfxxf
xxf
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 46
![Page 47: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/47.jpg)
Hypothesis
n
iii
n
iii xfxf
kn
11
)()( i.e
for trueSuppose
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 47
![Page 48: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/48.jpg)
Induction Step
1 given
)()(
thatShow
1321
1
1
1
1
kk
k
iii
k
iii xfxf
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 48
![Page 49: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/49.jpg)
Proof
)1( where )()()1(
convexityBy )())1(
()1(
))1(
)1((
)(
111
11
111 1
1
111 1
1
11332211
k
iikk
k
iiik
kk
k
i k
iik
kk
k
i k
iik
kk
xfxf
xfxf
xxf
xxxxf
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 49
![Page 50: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/50.jpg)
Continued...
Examine µis
1 because
1)1()1(
)1(
)1()1()1()1(
1321
1
1
1
321
11
3
1
2
1
1
3211
kk
k
k
k
k
k
k
kkk
k
k
ii
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 50
![Page 51: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/51.jpg)
Continued...
Therefore,
proved is inequality Jensen´s Thus
)()(
stepinduction at theFinally
)()(
)()()1(
)()()1(
1
1
1
1
111
111
1
111
1
i
k
ii
k
iii
kki
k
ii
kki
k
iik
kk
k
iiik
xfxf
xfxf
xfxf
xfxf
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 51
![Page 52: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/52.jpg)
KL -divergence
We will do the discrete form of probability distribution.
Given two probability distribution P,Q on the random variable
X : x1,x2,x3...xN
P:p1=p(x1 ), p2=p(x2), ... pn=p(xn) Q:q1=q(x1 ), q2=q(x2), ... qn=q(xn)
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 52
![Page 53: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/53.jpg)
KLD definition
Q)(EP)(E DKL(P,Q)
DD
q,p qpp D KL(P,Q)
pp
iii
iN
ii
loglog
as written also0 and cassymmetri is
11log1
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 53
![Page 54: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/54.jpg)
Proof: KLD>=0
)x(pxp
],[x pqp
qpp
qpp KL(P,Q)
i
N
iii
N
ii
i
iN
ii
i
iN
ii
i
iN
ii
loglog So
0in convex islog
loglog
-:Proof
0log
11
11
1
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 54
![Page 55: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/55.jpg)
Proof cntd.
Apply Jensen’s inequality
10log
loglog
loglog So
11
11
11
N
ii
i
iN
ii
i
iN
ii
N
ii
i
iN
ii
i
iN
ii
q qpp
qppq
)pq(p
pqp
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 55
![Page 56: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/56.jpg)
Convexity of –log x
1 1)1(
1)1(
1)1(
)1(
log)1(log))1(log(..
)log)(1()log())1(log(
2
1
1
2
2
1
1
112
2
1
12121
2121
2121
1
1
1
xxy
yy
xx
xx
xx
xx
xxxx
xxxxei
xxxx
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 56
![Page 57: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/57.jpg)
Interesting problem
Try to prove:-
21 2121
21
2211 ww ww xxww
xwxw
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 57
![Page 58: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/58.jpg)
2nd definition of convexity
Theorem:
.convex is log So.in convex is then ,0
and in abledifferenti twiceis )( If
x-[a,b]f(x)[a,b] x (x)f
[a,b]xf''
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 58
![Page 59: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/59.jpg)
Lemma 1
],[ s t,and s t, ),()(then ],[in 0)( If
''
''
batssftfbaxf
a s z t b
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 59
![Page 60: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/60.jpg)
Mean Value Theorem
npm(p) m)f(nf(m)f(n)xf
(z,a)s (s) a)f(zf(a)f(z)
'
'
where)(function any For
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 60
![Page 61: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/61.jpg)
Alternative form of z
Add –λz to both sides
21 1 λ)x(λxz
)xλ(zz)λ)(x(λ)x(z)λ(xλ)z(
12
21
111
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 61
![Page 62: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/62.jpg)
Alternative form of convexity
Add –λf(z) to both sides
)λ)f(x()λf(x)λ)x(f(λ( 2121 11
)λ)f(x(f(z)))λ(f(xλ)f(z)()λ)f(x(f(z)))λ(f(xλ)f(z)(
λf(z))λ)f(x()λf(xλf(z)f(z)
21
21
21
1111
1
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 62
![Page 63: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/63.jpg)
Proof: second derivative >=0 implies convexity (1/2)We have that,
(2) ][z]-)[x-(1(1) )]()([)]()()[1(
)()1()()(
)1(
12
12
21
21
xzxfzfzfxf
xfxfzf
xxz
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 63
![Page 64: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/64.jpg)
Second derivative >=0 implies convexity (2/2)
(2) Is equivalent to
For some s and t , where
Now since f’’(x) >=0
)(')(' sftf
Combining this with (1), the result is proved
))(()).(()1( 12 xzsfxtf
21 xtzsx
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 64
![Page 65: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/65.jpg)
Why all this In EM, we maximize the expectation of
log likelihood of the data Log is a concave function We have to take iterative steps to get
to the maximum There are two unknown values: Z
(unobserved data) and θ (parameters) From θ, get new value of Z (E-step) From Z, get new value of θ (M-step)4 Aug, 2014
Pushpak Bhattacharyya: ML and ME 65
![Page 66: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/66.jpg)
Recap: a simple EM situation Toss of two coins:
Parameters <P, P1, P2> P = Probability of choosing first coin P1 = Probability of choosing head from first
coin P2 = Probability of choosing head from second
coin
We do not know which coin the observation came from
NxxxxX ,.....,,: 321
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 66
![Page 67: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/67.jpg)
EM continued..
Z1, Z2, Z3,…, ZN is the hidden sequence running alongside X1, X2, X3,…, XN
Where, Zi =1, if the ith observation came from coin 1, =0, otherwise
21
321
,,,....,,,
),,Pr();Pr(
PPPzzzzZ
ZXX
N
Z
NN zxzxzxzxY ,......,,,: 332211
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 67
![Page 68: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/68.jpg)
Cntd.
We want to work with
Invoke convexity/concavity and expectation of Zi and work with log(Pr(Y;θ))
N
i
zxxzxx iiiiii PPPPPP
ZXPY
1
1122
111 ))1.().1((*))1.(.(
);,();Pr(
));,(log();( Z
ZXPXLL
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 68
![Page 69: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/69.jpg)
N
iiii PxPxPzEXLL
1
11 ))1log()1(log)(log([);(
))]1log()1(log)1))(log((1( 22 PxPxPziE ii
Log Likelihood of the Data
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 69
![Page 70: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/70.jpg)
IMPORTANT POINTS TO NOTE
Log moves inside the product term. Σ disappears giving rise to E(Zi) in place
of Zi
Differentiate wrt p, p1, p2, equate to 0 and get the results
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 70
![Page 71: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/71.jpg)
P, P1, P2
)()(
1
11
i
Ni
iiNi
zExzE
p
)()(
1
12
i
Ni
iiNi
zENxzEM
p
M= observed no. of heads
NzE
p iNi )(1
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 71
)1(22
)1(11
)1(11
)1()1()1()1(
)(/)1|().1()|1()(
iiii
ii
xxxx
xxiiiiii
i
PPPPPPPPP
xxPzxxPzPxxzPzE
![Page 72: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/72.jpg)
How to find θ How to choose the next θ? Take
Where,X: observed dataZ: unobserved dataΘ: parameterLL(X,Z:θn): log likelihood of complete
data with parameter value at θn
This is in lieu of, for example, gradient ascent
θnΘ
At every step LL(.) willIncrease, ultimatelyreaching local/globalmaximum
)):,():,((maxarg nZXLLZXLL
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 72
![Page 73: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/73.jpg)
Why expectation of log likelihood? (1/3) P(X:θ) is the observation likelihood
Deal with P(X,Z:θ), marginalized over Z
Log(ΣZP(X,Z:θ)) is mathematically processed with multiplying by P(Z|X: θn) which for each Z is between 0 and 1 and sums to 1
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 73
![Page 74: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/74.jpg)
Why expectation of log likelihood? (2/3) Then Jensen inequality will give
));|(
);,(log();|(
at y probabilit theis );|( where ),;|(by devide andmultiply
));|(
);,();|(log());,(log(
nzn
nn
n
z n
n
z
XZPZXPXZP
XZPXZP
XZPZXPXZPZXP
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 74
![Page 75: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/75.jpg)
Why expectation of log likelihood? (3/3)
Z. w.r.t.data complete of liklihood log ofn expectatio theis (.) where))),;,((log(
));,(log();|(
));();((maxarg So,
));,();,(log();|(
1);|( since
));().;|(
);,(log();|(
));(log());|(
);,();|(log(
));(log());,(log();();(
zz
Zn
n
nZn
Zn
nnZn
nZ n
n
nZ
n
EZXPE
ZXPXZP
XLLXLLZXPZXPXZP
XZPXPXZP
ZXPXZP
XPXZP
ZXPXZP
XPZXPXLLXLL
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 75
![Page 76: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/76.jpg)
Why expectation of Z?
If the log likelihood is a linear function of Z, then the expectation can be carried inside of the log likelihood and E(Z) is computed
The above is true when the hidden variables form a mixture of distributions (e..g, in tosses of two coins), and
Each distribution is an exponential distribution like multinomial/normal/poisson
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 76
![Page 77: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/77.jpg)
Application of EM: HMM Training
Baum Welch or Forward Backward Algorithm
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 77
![Page 78: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/78.jpg)
A problem scenario
Unsupervised POS tagging Convert the Brown corpus into a corpus
with ONLY the following tags: N (noun), V (verb), J (adjective), R
(adverb), F (function words like prepositions and conjunctions), A (articles ‘a’, ‘an’, ‘the’) and O (others)
Assumes raw corpus and then create a POS tagger
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 78
![Page 79: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/79.jpg)
Key Intuition
Given: Training sequenceInitialization: Probability valuesCompute: Pr (state seq | training seq)
get expected count of transitioncompute rule probabilities
Approach: Initialize the probabilities and recompute them… EM like approach
a
b
a
b
a
b
a
b
q r
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 79
![Page 80: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/80.jpg)
Baum-Welch algorithm: counts
String = abb aaa bbb aaa
Sequence of states with respect to input symbols
a, b
a,b
q ra,b
rqrqqqrqrqqrq aaabbbaaabba o/p seq
State seq
a,b
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 80
![Page 81: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/81.jpg)
Calculating probabilities from tableTable of counts
T=#statesA=#alphabet symbols
Now if we have a non-deterministic transitions then multiple state seq possible for the given o/p seq (ref. to previous slide’s feature). Our aim is to find expected count through this.
8/3)( bqP b
Src Dest O/P Count
q r a 5
q q b 3
r q a 3
r q b 2
8/5)( rqP a
T
l
A
m
li
jiji
swsc
swscswsPm
kk
1 1)(
)()(
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 81
![Page 82: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/82.jpg)
Interplay Between Two Equations
T
l
A
m
lWmi
jWijWi
ssc
sscssPk
k
0 0
)(
)()(
1,0
),,()|()(
,01,0,01,0n
k
k
snn
jWinn
jWi
wSssnWSPssC
wk
No. of times the transitions sisj occurs in the string4 Aug, 2014
Pushpak Bhattacharyya: ML and ME 82
![Page 83: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/83.jpg)
Illustration
a:0.67
b:1.0
b:0.17
a:0.16
q r
a:0.04
b:1.0
b:0.48
a:0.48
q r
Actual (Desired) HMM
Initial guess
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 83
![Page 84: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/84.jpg)
One run of Baum-Welch algorithm: string ababb
P(path)
q r q r q q 0.00077 0.00154 0.00154 0 0.00077
q r q q q q 0.00442 0.00442 0.00442 0.00442
0.00884
q q q r q q 0.00442 0.00442 0.00442 0.00442
0.00884
q q q q q q 0.02548 0.0 0.000 0.05096
0.07644
Rounded Total 0.035 0.01 0.01 0.06 0.095
New Probabilities (P) 0.06=(0.01/(0.01+0.06+
0.095)
1.0 0.36 0.581
qbq qaq raq qbr a ba ab bb bba
* ε is considered as starting and ending symbol of the input sequence string.
State sequences
Through multiple iterations the probability values will converge.4 Aug, 2014
Pushpak Bhattacharyya: ML and ME 84
![Page 85: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/85.jpg)
Computational part (1/2)
ntn
jtkt
it
n
nt snn
jtkt
it
n
snn
jWinn
n
snn
jWinn
jWi
WsSwWsSPWP
WSsSwWsSPWP
WSssnWSPWP
WSssnWSPssC
n
n
k
n
kk
,0,01
,0
,0,01,01
,0
,01,0,01,0,0
,01,0,01,0
)],,,([)(
1
)],,,,([)(
1
)],,(),([)(
1
)],,()|([)(
1,0
1,0
1,0
w0 w1 w2 wk wn-1 wn
S0 S1 S1 … Si Sj … Sn-1 Sn Sn+1
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 85
![Page 86: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/86.jpg)
Computational part (2/2)
),1()(),1(
),1()|,(),1(
),1()|,(),1(
)|(),|,(),(
),,,,(
),,,(
0
0
1
0
1
1
0,11,011,0
0,111,0
0,01
jtBswsPitF
jtBsSwWsSPitF
jtBsSwWsSPitF
sSWPsSWwWsSPsSWP
WwWsSsSWP
WwWsSsSP
n
t
ji
n
t
itkt
jt
n
t
itkt
jt
jt
n
tnt
ittkt
jt
itt
n
tntkt
jt
itt
n
tnkt
jt
it
k
w0 w1 w2 wk wn-1 wn
S0 S1 S1 … Si Sj … Sn-1 Sn Sn+1
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 86
![Page 87: Speech, NLP and the Webpb/cs626-2014/cs626-lect7to10-ml-me-4aug14.pdf · Inductive Bias: What too learn, in what form to learn are pre-decided 4 Aug, 2014 ... Oriya Wordnet Kashmiri](https://reader034.fdocuments.net/reader034/viewer/2022042106/5e85a18c6c31c20bdb6fd54a/html5/thumbnails/87.jpg)
Discussions1. Symmetry breaking:
Example: Symmetry breaking leads to no change in initial values
2 Struck in Local maxima3. Label bias problem
Probabilities have to sum to 1.Values can rise at the cost of fall of values for others.
s
ss
b:1.0
b:0.5
a:0.5
a:1.0
s
ss
a:0.5
b:0.5
a:0.25
a:0.5b:0.5
a:0.25
b:0.25
b:0.5
Desired Initialized
4 Aug, 2014Pushpak Bhattacharyya: ML and
ME 87