MITACS ACCELERATE Canada’s Graduate Research Internship Program
Discriminative, Unsupervised, Convex Learning Dale Schuurmans Department of Computing Science...
-
Upload
augusta-fitzgerald -
Category
Documents
-
view
215 -
download
0
Transcript of Discriminative, Unsupervised, Convex Learning Dale Schuurmans Department of Computing Science...
Discriminative, Unsupervised,
Convex Learning
Dale SchuurmansDepartment of Computing Science
University of Alberta
MITACS Workshop, August 26, 2005
2
Current Research GroupPhD Tao Wang reinforcement learning
PhD Ali Ghodsi dimensionality reduction
PhD Dana Wilkinson action-based embedding
PhD Yuhong Guo ensemble learning
PhD Feng Jiao bioinformatics
PhD Jiayuan Huang transduction on graphs
PhD Qin Wang statistical natural language
PhD Adam Milstein robotics, particle filtering
PhD Dan Lizotte optimization, everything
PhD Linli Xu unsupervised SVMs
PDF Li Cheng computer vision
3
Current Research GroupPhD Tao Wang reinforcement learning
PhD Dana Wilkinson action-based embedding
PhD Feng Jiao bioinformatics
PhD Qin Wang statistical natural language
PhD Dan Lizotte optimization, everything
PDF Li Cheng computer vision
4
Today I will talk about: One Current Research Direction
Learning Sequence Classifiers (HMMs)
Discriminative Unsupervised Convex
EM?
8
Main Idea
Unsupervised SVMs(and semi-supervised SVMs)
Harder computational problem than SVMs
Convex relaxation – Semidefinite program(Polynomial time)
9
Background: Two-class SVM Supervised classification learning
Labeled data linear discriminant
Classification rule:
Some better than others?
0b w x
sgn( )y b w x
+
10
Maximum Margin Linear Discriminant
Choose a linear discriminant to maximize
,min ( , , Plane 0)i i i iy dist y b x x w x
0b w x
11
Unsupervised Learning Given unlabeled data,
how to infer classifications?
Organize objects into groups — clustering
12
Idea: Maximum Margin Clustering Given unlabeled data,
find maximum margin separating hyperplane
Clusters the data
Constraint: class balance: bound difference in sizes between classes
13
Challenge
Find label assignment that results in a large margin
Hard
Convex relaxation – based on semidefinite programming
14
How to Derive Unsupervised SVM?
Two-class case:1. Start with Supervised Algorithm
Given vector of assignments, y, solve
* 2 1
2max ,
subject to 0 1
K
λλ e λλ yy
λ
Inv. sq. margin
15
How to Derive Unsupervised SVM?
2. Think of as a function of y
If given y, would then solve
* 2 1
2( ) max ,
subject to 0 1
K
λy λ e λλ yy
λ
* 2
Goal: Choose y to minimize inverse squared margin
Problem: not a convex function of yInv. sq.
margin
16
How to Derive Unsupervised SVM?
3. Re-express problem with indicators comparing y labels
If given y, would then solve
* 2 1
2( ) max ,
subject to 0 1
K
λy λ e λλ yy
λ
New variables: An equivalence relation matrix
M yy
1 if
1 if i j
iji j
y yM
y y
Inv. sq. margin
17
How to Derive Unsupervised SVM?
3. Re-express problem with indicators comparing y labels
If given M, would then solve
* 2 1
2( ) max ,
subject to 0 1
M MK
λλ e λλ
λ
New variables: An equivalence relation matrix
M yy
1 if
1 if i j
iji j
y yM
y y
Maximum of linear functions is convex
Inv. sq. margin
Note: convex function of M
18
How to Derive Unsupervised SVM?
4. Get constrained optimization problem
Solve for M
* 2min ( )
subject to 0 1
1, 1
M
n n
M
M
M
M
λ
e e
yy
e
encodes an equivalence relation
iff0, diag( )M M e±
1, 1n n
M
Not convex!
Class balance
19
How to Derive Unsupervised SVM?
4. Get constrained optimization problem
Solve for M
* 2min ( )
subject to 0 1
1, 1
0, dia
g( )
M
n n
M
M
M
M
M
λ
e e e
e± encodes
an equivalence relationiff
0, diag( )M M e±
1, 1n n
M
20
How to Derive Unsupervised SVM?
5. Relax indicator variables to obtain a convex optimization problem
Solve for M
* 2min ( )
subject to 0 1
0, diag( )
1, 1
n n
MM
M
M M
M
λ
e
e e e
±
21
How to Derive Unsupervised SVM?
5. Relax indicator variables to obtain a convex optimization problem
Solve for M
* 2min ( )
subject to 0 1
0, diag( )
1, 1
n n
MM
M
M M
M
λ
e
e e e
±Semidefinite
program
22
Multi-class Unsupervised SVM?
1. Start with Supervised Algorithm
Given vector of assignments, y, solve
,, ,
1
2max 1 1
subject to 0, 1
i j iy yij i j ir y r
i j i r
i i
K
i
λ
e
(Crammer & Singer 01)
Margin loss
23
Multi-class Unsupervised SVM?
2. Think of as a function of y
If given y, would then solve
,, ,
1
2max 1 1
subject to 0, 1
i j iy yij i j ir y r
i j i r
i i
K
i
λ
y
e
(Crammer & Singer 01)
Margin loss
Goal: Choose y to minimize margin
loss
Problem: not a convex function of y
24
Multi-class Unsupervised SVM?
3. Re-express problem with indicators comparing y labels
If given y, would then solve
,, ,
1
2max 1 1
subject to 0, 1
i j iy yij i j ir y r
i j i r
i i
K
i
λ
y
e
(Crammer & Singer 01)
Margin loss
New variables: M & D
( ) ( )1 , 1i j iij y y ir y rM D
M DD
25
Multi-class Unsupervised SVM?
3. Re-express problem with indicators comparing y labels
If given M and D, would then solve
1
2
1 1
2
, max ( , , ) subject to 0,
where ( , , ) , ,
, ,
M D Q M D
Q M D n D K M
KD K
Λe e
New variables: M & D
( ) ( )1 , 1i j iij y y ir y rM D
M DD
Margin loss convex
function of M & D
26
Multi-class Unsupervised SVM?
4. Get constrained optimization problem
Solve for M and D
,min ,
subject to , diag( )
0,1 , 0,1
1 1
M D
n n n k
M D
M DD M
M D
n M nk k
e
e e eClass balance
27
Multi-class Unsupervised SVM?
5. Relax indicator variables to obtain a convex optimization problem
Solve for M and D
,min ,
subject to , diag( )
1 1
0,1 , 0,1n n n k
M D
M DD
M
M D
D
M
n M nk k
e
e e e
28
Multi-class Unsupervised SVM?
5. Relax indicator variables to obtain a convex optimization problem
Solve for M and D
,min ,
subject to , diag( )
1 1
0 1, 0
1
M D M D
M
n M nk
M DD
D
k
M
e
e e e
±
Semidefinite program
32
Extension to Semi-Supervised Algorithm
11 t
t
Labeled
(Cl amped)
Unlabeled
1
2
{ 1,..., }
t
ijj
M t
i t n
ij i jM y y
Matrix M :
Discriminative, Unsupervised, Convex HMMs
Joint work withLinli Xu
With help from Li Cheng and Tao Wang
37
Hidden Markov Model
Joint probability model Viterbi classifier
1y 2y 3y
3x2x1x
)(xyP)|(maxarg xy
y
P
“hidden” state
observations
Must coordinate local classifiers )( ii xfy
38
HMM Training: Supervised
Given ,11yx ,...22yx nnyx
Maximum likelihood
Conditional likelihood
)(max1 ii
n
iP yx
)|(max1 ii
n
iP xy
)()|(max1 iii
n
iPP xxy
Models input distribution
Discriminative(CRFs)
39
HMM Training: Unsupervised
Given only Now what?
,1x ,...2x nx
EM!
Marginal likelihood )(max1 i
n
iP x
Exactly the part we don’t
care about
40
HMM Training: Unsupervised
Given only
The problem with EM: Not convex Wrong objective Too popular Doesn’t work
,1x ,...2x nx
41
HMM Training: Unsupervised
Given only
The dream: Convex training Discriminative training
When will someone invent unsupervised CRFs?
,1x ,...2x nx
)|( xyP
42
HMM Training: Unsupervised
Given only
The question: How to learn effectively
without seeing any y’s?
,1x ,...2x nx
)|( xyP
43
HMM Training: Unsupervised
Given only
The question: How to learn effectively
without seeing any y’s?
The answer: That’s what we already did!
Unsupervised SVMs
,1x ,...2x nx
)|( xyP
44
HMM Training: Unsupervised
Given only
The plan:
,1x ,...2x nx
supervised
unsupervised
single sequence
SVM M3N
unsup SVM ?
y y
45
M3N: Max Margin Markov Nets
Relational SVMs
Supervised training: Given Solve factored QP
,11yx ,...22yx nnyx
1y 2y 3y
3x2x1x
),( 21 yyxf
46
Unsupervised M3Ns Strategy
Start with supervised M3N QP y-labels re-express in local M,D
equivalence relations Impose class-balance Relax non-convex constraints
Then solve a really big SDP But still polynomial size
50
Current Research GroupPhD Tao Wang reinforcement learning
PhD Dana Wilkinson action-based embedding
PhD Feng Jiao bioinformatics
PhD Qin Wang statistical natural language
PhD Dan Lizotte optimization, everything
PDF Li Cheng computer vision