Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based...
Transcript of Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based...
![Page 1: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université](https://reader036.fdocuments.net/reader036/viewer/2022063000/5f0f4e357e708231d44380e8/html5/thumbnails/1.jpg)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Graph-Based Semi-Supervised Learning
Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux
Université de Montréal
CIAR Workshop - April 26th, 2005
![Page 2: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université](https://reader036.fdocuments.net/reader036/viewer/2022063000/5f0f4e357e708231d44380e8/html5/thumbnails/2.jpg)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Graph-Based Semi-Supervised Learning
Yoshua Bengio, Olivier Delalleau and Nicolas Le Roux
Université de Montréal
CIAR Workshop - April 26th, 2005
![Page 3: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université](https://reader036.fdocuments.net/reader036/viewer/2022063000/5f0f4e357e708231d44380e8/html5/thumbnails/3.jpg)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Graph-Based Semi-Supervised Learning
Nicolas Le Roux, Olivier Delalleau and Yoshua Bengio
Université de Montréal
CIAR Workshop - April 26th, 2005
![Page 4: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université](https://reader036.fdocuments.net/reader036/viewer/2022063000/5f0f4e357e708231d44380e8/html5/thumbnails/4.jpg)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Graph-Based Semi-Supervised Learning
Olivier Delalleau, Nicolas Le Roux and Yoshua Bengio
Université de Montréal
CIAR Workshop - April 26th, 2005
![Page 5: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université](https://reader036.fdocuments.net/reader036/viewer/2022063000/5f0f4e357e708231d44380e8/html5/thumbnails/5.jpg)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Graph-Based Semi-Supervised Learning
Yoshua Bengio, Nicolas Le Roux and Olivier Delalleau
Université de Montréal
CIAR Workshop - April 26th, 2005
![Page 6: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université](https://reader036.fdocuments.net/reader036/viewer/2022063000/5f0f4e357e708231d44380e8/html5/thumbnails/6.jpg)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Graph-Based Semi-Supervised Learning
Nicolas Le Roux, Yoshua Bengio and Olivier Delalleau
Université de Montréal
CIAR Workshop - April 26th, 2005
![Page 7: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université](https://reader036.fdocuments.net/reader036/viewer/2022063000/5f0f4e357e708231d44380e8/html5/thumbnails/7.jpg)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Outline
1 Semi-Supervised Setting
2 Graph Regularization and Label Propagation
3 Transduction vs. Induction
4 Curse of Dimensionality
![Page 8: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université](https://reader036.fdocuments.net/reader036/viewer/2022063000/5f0f4e357e708231d44380e8/html5/thumbnails/8.jpg)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Semi-Supervised Learning for Dummies
Task of binary classification with labels yi ∈ {−1, 1}
Semi-supervised = learn something about labels usingboth labeled and unlabeled data
X = (x 1, x 2, . . . , x n)
Yl = (y1, y2, . . . , yl)
n = l + u
Transduction ⇒ Yu = (yl+1, yl+2, . . . , yn)
Induction ⇒ y : x → y(x)
![Page 9: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université](https://reader036.fdocuments.net/reader036/viewer/2022063000/5f0f4e357e708231d44380e8/html5/thumbnails/9.jpg)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Semi-Supervised Learning for Dummies
Task of binary classification with labels yi ∈ {−1, 1}Semi-supervised = learn something about labels usingboth labeled and unlabeled data
X = (x 1, x 2, . . . , x n)
Yl = (y1, y2, . . . , yl)
n = l + u
Transduction ⇒ Yu = (yl+1, yl+2, . . . , yn)
Induction ⇒ y : x → y(x)
![Page 10: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université](https://reader036.fdocuments.net/reader036/viewer/2022063000/5f0f4e357e708231d44380e8/html5/thumbnails/10.jpg)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Semi-Supervised Learning for Dummies
Task of binary classification with labels yi ∈ {−1, 1}Semi-supervised = learn something about labels usingboth labeled and unlabeled data
X = (x 1, x 2, . . . , x n)
Yl = (y1, y2, . . . , yl)
n = l + u
Transduction ⇒ Yu = (yl+1, yl+2, . . . , yn)
Induction ⇒ y : x → y(x)
![Page 11: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université](https://reader036.fdocuments.net/reader036/viewer/2022063000/5f0f4e357e708231d44380e8/html5/thumbnails/11.jpg)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Semi-Supervised Learning for Dummies
Task of binary classification with labels yi ∈ {−1, 1}Semi-supervised = learn something about labels usingboth labeled and unlabeled data
X = (x 1, x 2, . . . , x n)
Yl = (y1, y2, . . . , yl)
n = l + u
Transduction ⇒ Yu = (yl+1, yl+2, . . . , yn)
Induction ⇒ y : x → y(x)
![Page 12: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université](https://reader036.fdocuments.net/reader036/viewer/2022063000/5f0f4e357e708231d44380e8/html5/thumbnails/12.jpg)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
The Classical Two-Moon Problem
![Page 13: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université](https://reader036.fdocuments.net/reader036/viewer/2022063000/5f0f4e357e708231d44380e8/html5/thumbnails/13.jpg)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
The Classical Two-Moon Problem
![Page 14: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université](https://reader036.fdocuments.net/reader036/viewer/2022063000/5f0f4e357e708231d44380e8/html5/thumbnails/14.jpg)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Where are Manifolds and Kernels?
What is a good labeling Y = (Yl , Yu)?
1 one that is consistent with the given labels: Yl ' Yl
2 one that is smooth on the
manifold
where the data lie(
manifold
/ cluster assumption):
yi ' yj when x i close to x j
⇒ Cost function
C(Y ) =l∑
i=1
(yi − yi)2 +
µ
2
n∑i,j=1
W ij(yi − yj)2
with W ij = WX (x i , x j) a positive weighting function (e.g
kernel
)
![Page 15: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université](https://reader036.fdocuments.net/reader036/viewer/2022063000/5f0f4e357e708231d44380e8/html5/thumbnails/15.jpg)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Where are Manifolds and Kernels?
What is a good labeling Y = (Yl , Yu)?1 one that is consistent with the given labels: Yl ' Yl
2 one that is smooth on the
manifold
where the data lie(
manifold
/ cluster assumption):
yi ' yj when x i close to x j
⇒ Cost function
C(Y ) =l∑
i=1
(yi − yi)2 +
µ
2
n∑i,j=1
W ij(yi − yj)2
with W ij = WX (x i , x j) a positive weighting function (e.g
kernel
)
![Page 16: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université](https://reader036.fdocuments.net/reader036/viewer/2022063000/5f0f4e357e708231d44380e8/html5/thumbnails/16.jpg)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Where are Manifolds and Kernels?
What is a good labeling Y = (Yl , Yu)?1 one that is consistent with the given labels: Yl ' Yl
2 one that is smooth on the
manifold
where the data lie(
manifold
/ cluster assumption):
yi ' yj when x i close to x j
⇒ Cost function
C(Y ) =l∑
i=1
(yi − yi)2 +
µ
2
n∑i,j=1
W ij(yi − yj)2
with W ij = WX (x i , x j) a positive weighting function (e.g
kernel
)
![Page 17: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université](https://reader036.fdocuments.net/reader036/viewer/2022063000/5f0f4e357e708231d44380e8/html5/thumbnails/17.jpg)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Where are Manifolds and Kernels?
What is a good labeling Y = (Yl , Yu)?1 one that is consistent with the given labels: Yl ' Yl
2 one that is smooth on the manifold where the data lie(manifold / cluster assumption):
yi ' yj when x i close to x j
⇒ Cost function
C(Y ) =l∑
i=1
(yi − yi)2 +
µ
2
n∑i,j=1
W ij(yi − yj)2
with W ij = WX (x i , x j) a positive weighting function (e.g
kernel
)
![Page 18: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université](https://reader036.fdocuments.net/reader036/viewer/2022063000/5f0f4e357e708231d44380e8/html5/thumbnails/18.jpg)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Where are Manifolds and Kernels?
What is a good labeling Y = (Yl , Yu)?1 one that is consistent with the given labels: Yl ' Yl
2 one that is smooth on the manifold where the data lie(manifold / cluster assumption):
yi ' yj when x i close to x j
⇒ Cost function
C(Y ) =l∑
i=1
(yi − yi)2 +
µ
2
n∑i,j=1
W ij(yi − yj)2
with W ij = WX (x i , x j) a positive weighting function (e.g
kernel
)
![Page 19: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université](https://reader036.fdocuments.net/reader036/viewer/2022063000/5f0f4e357e708231d44380e8/html5/thumbnails/19.jpg)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Where are Manifolds and Kernels?
What is a good labeling Y = (Yl , Yu)?1 one that is consistent with the given labels: Yl ' Yl
2 one that is smooth on the manifold where the data lie(manifold / cluster assumption):
yi ' yj when x i close to x j
⇒ Cost function
C(Y ) =l∑
i=1
(yi − yi)2 +
µ
2
n∑i,j=1
W ij(yi − yj)2
with W ij = WX (x i , x j) a positive weighting function (e.g kernel)
![Page 20: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université](https://reader036.fdocuments.net/reader036/viewer/2022063000/5f0f4e357e708231d44380e8/html5/thumbnails/20.jpg)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Graphs Would be Cool too!
Nodes = points, edges (i , j) ⇔ W ij > 0.
![Page 21: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université](https://reader036.fdocuments.net/reader036/viewer/2022063000/5f0f4e357e708231d44380e8/html5/thumbnails/21.jpg)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Graphs Would be Cool too!
Nodes = points, edges (i , j) ⇔ W ij > 0.
![Page 22: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université](https://reader036.fdocuments.net/reader036/viewer/2022063000/5f0f4e357e708231d44380e8/html5/thumbnails/22.jpg)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Regularization on Graph
Graph Laplacian: L ii =∑
j 6=i W ij and L ij = −W ij .
C(Y ) =l∑
i=1
(yi − yi)2 +
µ
2
n∑i,j=1
W ij(yi − yj)2
= ‖Yl − Yl‖2 + µY>LY
C(Y ) is minimized when
(S + µL) Y = Y
⇒ linear system with n unknowns and equations.
![Page 23: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université](https://reader036.fdocuments.net/reader036/viewer/2022063000/5f0f4e357e708231d44380e8/html5/thumbnails/23.jpg)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Regularization on Graph
Graph Laplacian: L ii =∑
j 6=i W ij and L ij = −W ij .
C(Y ) =l∑
i=1
(yi − yi)2 +
µ
2
n∑i,j=1
W ij(yi − yj)2
= ‖Yl − Yl‖2 + µY>LY
C(Y ) is minimized when
(S + µL) Y = Y
⇒ linear system with n unknowns and equations.
![Page 24: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université](https://reader036.fdocuments.net/reader036/viewer/2022063000/5f0f4e357e708231d44380e8/html5/thumbnails/24.jpg)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
From Matrix Inversion to Label Propagation
Linear system rewrites for a labeled point
yi =
∑j W ij yj + 1
µyi∑j W ij + 1
µ
and for an unlabeled point
yi =
∑j W ij yj∑j W ij
.
![Page 25: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université](https://reader036.fdocuments.net/reader036/viewer/2022063000/5f0f4e357e708231d44380e8/html5/thumbnails/25.jpg)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
From Matrix Inversion to Label Propagation
Linear system rewrites for a labeled point
y (t+1)i =
∑j W ij y
(t)j + 1
µyi∑j W ij + 1
µ
and for an unlabeled point
y (t+1)i =
∑j W ij y
(t)j∑
j W ij.
Jacobi or Gauss-Seidel iteration algorithm.
![Page 26: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université](https://reader036.fdocuments.net/reader036/viewer/2022063000/5f0f4e357e708231d44380e8/html5/thumbnails/26.jpg)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
No, I didn’t come up with that last week-end
X. Zhu, Z. Ghahramani and J. Lafferty (2003):Semi-supervised learning using Gaussian fields andharmonic functions
D. Zhou, O. Bousquet, T. Navin Lal, J. Weston, B.Schölkopf (2004): Learning with local and globalconsistency
M. Belkin, I. Matveeva and P. Niyogi (2004): Regularizationand Semi-supervised Learning on Large Graphs
![Page 27: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université](https://reader036.fdocuments.net/reader036/viewer/2022063000/5f0f4e357e708231d44380e8/html5/thumbnails/27.jpg)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Remember your Physics’ class?
Electric network analogy (Doyle and Snell, 1984; Zhu,Ghahramani and Lafferty, 2003).Graph ⇔ electric network with resistors between nodes:
R ij =1
W ij
Ohm’s law (potential = label):
yj − yi = R ij I ij
Kirchoff’s law on an unlabeled node i :∑j
I ij = 0
⇒ Same linear system as minimizing C(Y ) over Yu only.
![Page 28: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université](https://reader036.fdocuments.net/reader036/viewer/2022063000/5f0f4e357e708231d44380e8/html5/thumbnails/28.jpg)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Remember your Physics’ class?
Electric network analogy (Doyle and Snell, 1984; Zhu,Ghahramani and Lafferty, 2003).Graph ⇔ electric network with resistors between nodes:
R ij =1
W ij
Ohm’s law (potential = label):
yj − yi = R ij I ij
Kirchoff’s law on an unlabeled node i :∑j
I ij = 0
⇒ Same linear system as minimizing C(Y ) over Yu only.
![Page 29: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université](https://reader036.fdocuments.net/reader036/viewer/2022063000/5f0f4e357e708231d44380e8/html5/thumbnails/29.jpg)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Remember your Physics’ class?
Electric network analogy (Doyle and Snell, 1984; Zhu,Ghahramani and Lafferty, 2003).Graph ⇔ electric network with resistors between nodes:
R ij =1
W ij
Ohm’s law (potential = label):
yj − yi = R ij I ij
Kirchoff’s law on an unlabeled node i :∑j
I ij = 0
⇒ Same linear system as minimizing C(Y ) over Yu only.
![Page 30: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université](https://reader036.fdocuments.net/reader036/viewer/2022063000/5f0f4e357e708231d44380e8/html5/thumbnails/30.jpg)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Remember your Physics’ class?
Electric network analogy (Doyle and Snell, 1984; Zhu,Ghahramani and Lafferty, 2003).Graph ⇔ electric network with resistors between nodes:
R ij =1
W ij
Ohm’s law (potential = label):
yj − yi = R ij I ij
Kirchoff’s law on an unlabeled node i :∑j
I ij = 0
⇒ Same linear system as minimizing C(Y ) over Yu only.
![Page 31: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université](https://reader036.fdocuments.net/reader036/viewer/2022063000/5f0f4e357e708231d44380e8/html5/thumbnails/31.jpg)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
From Transduction to Induction
Solving the linear system ⇒ Y (transduction ).
From a new point x and already computed Y :
C(y(x )) = C(Y ) +µ
2
n∑i=1
WX (x i , x )(yi − y(x ))2
⇒ y(x ) =
∑ni=1 WX (x i , x )yi∑ni=1 WX (x i , x )
(Induction like Parzen Windows, but using estimated labels Y ).
![Page 32: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université](https://reader036.fdocuments.net/reader036/viewer/2022063000/5f0f4e357e708231d44380e8/html5/thumbnails/32.jpg)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
From Transduction to Induction
Solving the linear system ⇒ Y (transduction ).From a new point x and already computed Y :
C(y(x )) = C(Y ) +µ
2
n∑i=1
WX (x i , x )(yi − y(x ))2
⇒ y(x ) =
∑ni=1 WX (x i , x )yi∑ni=1 WX (x i , x )
(Induction like Parzen Windows, but using estimated labels Y ).
![Page 33: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université](https://reader036.fdocuments.net/reader036/viewer/2022063000/5f0f4e357e708231d44380e8/html5/thumbnails/33.jpg)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Faster Training from Subset
Previous algorithms are at least quadratic in n.
Induction formula ⇒ could train only on subset S.
Better: minimize the full cost over YS only.
For x i ∈ R = X \ S:
yi =
∑j∈S W ij yj∑j∈S W ij
i.e. YR = W RSYS: the cost C(Y ) now only depends on YS.
![Page 34: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université](https://reader036.fdocuments.net/reader036/viewer/2022063000/5f0f4e357e708231d44380e8/html5/thumbnails/34.jpg)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Faster Training from Subset
Previous algorithms are at least quadratic in n.
Induction formula ⇒ could train only on subset S.
Better: minimize the full cost over YS only.
For x i ∈ R = X \ S:
yi =
∑j∈S W ij yj∑j∈S W ij
i.e. YR = W RSYS: the cost C(Y ) now only depends on YS.
![Page 35: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université](https://reader036.fdocuments.net/reader036/viewer/2022063000/5f0f4e357e708231d44380e8/html5/thumbnails/35.jpg)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Faster Training from Subset
Previous algorithms are at least quadratic in n.
Induction formula ⇒ could train only on subset S.
Better: minimize the full cost over YS only.
For x i ∈ R = X \ S:
yi =
∑j∈S W ij yj∑j∈S W ij
i.e. YR = W RSYS: the cost C(Y ) now only depends on YS.
![Page 36: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université](https://reader036.fdocuments.net/reader036/viewer/2022063000/5f0f4e357e708231d44380e8/html5/thumbnails/36.jpg)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Faster Training from Subset
Previous algorithms are at least quadratic in n.
Induction formula ⇒ could train only on subset S.
Better: minimize the full cost over YS only.
For x i ∈ R = X \ S:
yi =
∑j∈S W ij yj∑j∈S W ij
i.e. YR = W RSYS: the cost C(Y ) now only depends on YS.
![Page 37: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université](https://reader036.fdocuments.net/reader036/viewer/2022063000/5f0f4e357e708231d44380e8/html5/thumbnails/37.jpg)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Nice Cost isn’t it?
C(YS) =µ
2
∑i,j∈R
W ij(yi − yj
)2
︸ ︷︷ ︸CRR
+ 2× µ
2
∑i∈R,j∈S
W ij(yi − yj
)2
︸ ︷︷ ︸CRS
+µ
2
∑i,j∈S
W ij(yi − yj
)2
︸ ︷︷ ︸CSS
+∑i∈L
(yi − yi)2
︸ ︷︷ ︸CL
![Page 38: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université](https://reader036.fdocuments.net/reader036/viewer/2022063000/5f0f4e357e708231d44380e8/html5/thumbnails/38.jpg)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Let’s Make it Simpler
Computing CRR is quadratic in n ⇒ just get rid of it.
Linear system with |S| = m � n unknowns ⇒ much faster
Still need to do matrix multiplications ⇒ scales as O(m2n)
This slide looked pretty empty with only three points
It is important to have reading material when nobodyunderstands your accent
![Page 39: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université](https://reader036.fdocuments.net/reader036/viewer/2022063000/5f0f4e357e708231d44380e8/html5/thumbnails/39.jpg)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Subset Selection
1 Random :
fast, easy, crappy.Main problem = does not “fill the space” well enough ⇒bad approximation by the induction formula (some pointshave no near neighbors in the subset)
2 Heuristic : greedy construction of subset by starting withthe labeled points and iteratively minimizing over i∑
j∈S
W ij
i.e. choose the point x i farthest from the current subset.(Additional tricks to eliminate outliers and sample morepoints near decision surface)
![Page 40: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université](https://reader036.fdocuments.net/reader036/viewer/2022063000/5f0f4e357e708231d44380e8/html5/thumbnails/40.jpg)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Subset Selection
1 Random : fast,
easy, crappy.Main problem = does not “fill the space” well enough ⇒bad approximation by the induction formula (some pointshave no near neighbors in the subset)
2 Heuristic : greedy construction of subset by starting withthe labeled points and iteratively minimizing over i∑
j∈S
W ij
i.e. choose the point x i farthest from the current subset.(Additional tricks to eliminate outliers and sample morepoints near decision surface)
![Page 41: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université](https://reader036.fdocuments.net/reader036/viewer/2022063000/5f0f4e357e708231d44380e8/html5/thumbnails/41.jpg)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Subset Selection
1 Random : fast, easy,
crappy.Main problem = does not “fill the space” well enough ⇒bad approximation by the induction formula (some pointshave no near neighbors in the subset)
2 Heuristic : greedy construction of subset by starting withthe labeled points and iteratively minimizing over i∑
j∈S
W ij
i.e. choose the point x i farthest from the current subset.(Additional tricks to eliminate outliers and sample morepoints near decision surface)
![Page 42: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université](https://reader036.fdocuments.net/reader036/viewer/2022063000/5f0f4e357e708231d44380e8/html5/thumbnails/42.jpg)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Subset Selection
1 Random : fast, easy, crappy.Main problem = does not “fill the space” well enough ⇒bad approximation by the induction formula (some pointshave no near neighbors in the subset)
2 Heuristic : greedy construction of subset by starting withthe labeled points and iteratively minimizing over i∑
j∈S
W ij
i.e. choose the point x i farthest from the current subset.(Additional tricks to eliminate outliers and sample morepoints near decision surface)
![Page 43: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université](https://reader036.fdocuments.net/reader036/viewer/2022063000/5f0f4e357e708231d44380e8/html5/thumbnails/43.jpg)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Subset Selection
1 Random : fast, easy, crappy.Main problem = does not “fill the space” well enough ⇒bad approximation by the induction formula (some pointshave no near neighbors in the subset)
2 Heuristic : greedy construction of subset by starting withthe labeled points and iteratively minimizing over i∑
j∈S
W ij
i.e. choose the point x i farthest from the current subset.(Additional tricks to eliminate outliers and sample morepoints near decision surface)
![Page 44: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université](https://reader036.fdocuments.net/reader036/viewer/2022063000/5f0f4e357e708231d44380e8/html5/thumbnails/44.jpg)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Experimental ResultsThis is not last-minute research so we do have results!
Table: Comparative Classification Error (Induction)
% labeled LETTERS MNIST COVTYPE1%
NoSub 56.0 35.8 47.3RandSubsubOnly 59.8 29.6 44.8
RandSub 57.4 27.7 75.7SmartSub 55.8 24.4 45.0
5%NoSub 27.1 12.8 37.1
RandSubsubOnly 32.1 14.9 35.4RandSub 29.1 12.6 70.6
SmartSub 28.5 12.3 35.810%
NoSub 18.8 9.5 34.7RandSubsubOnly 22.5 11.4 32.4
RandSub 20.3 9.7 64.7SmartSub 19.8 9.5 33.4
More comparisons between RandSub and SmartSub on 8 more UCI datasets⇒SmartSub always performs better.
![Page 45: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université](https://reader036.fdocuments.net/reader036/viewer/2022063000/5f0f4e357e708231d44380e8/html5/thumbnails/45.jpg)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Experimental ResultsThis is not last-minute research so we do have results!
Table: Comparative Classification Error (Induction)
% labeled LETTERS MNIST COVTYPE1%
NoSub 56.0 35.8 47.3RandSubsubOnly 59.8 29.6 44.8
RandSub 57.4 27.7 75.7SmartSub 55.8 24.4 45.0
5%NoSub 27.1 12.8 37.1
RandSubsubOnly 32.1 14.9 35.4RandSub 29.1 12.6 70.6
SmartSub 28.5 12.3 35.810%
NoSub 18.8 9.5 34.7RandSubsubOnly 22.5 11.4 32.4
RandSub 20.3 9.7 64.7SmartSub 19.8 9.5 33.4
More comparisons between RandSub and SmartSub on 8 more UCI datasets⇒SmartSub always performs better.
![Page 46: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université](https://reader036.fdocuments.net/reader036/viewer/2022063000/5f0f4e357e708231d44380e8/html5/thumbnails/46.jpg)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Curse of Dimensionality
Labeling function: y(x ) =∑
i yiW X (x i , x )
Locality : far point prediction = nearestneighbor; also normal vector ∂y
∂x (x ) isapproximately in the span of the nearestneighbors of x
Smoothness : it does not vary much within aball of radius small w.r.t. σ (obtained fromsecond derivative); also form of cost ⇒the learned function varies smoothly inregions with no labeled examples
Curse : one needs lots of unlabeled examplesto “fill” the region near the decision surface,and lots of labeled examples to account for all“clusters”
![Page 47: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université](https://reader036.fdocuments.net/reader036/viewer/2022063000/5f0f4e357e708231d44380e8/html5/thumbnails/47.jpg)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Curse of Dimensionality
Labeling function: y(x ) =∑
i yiW X (x i , x )
Locality : far point prediction = nearestneighbor; also normal vector ∂y
∂x (x ) isapproximately in the span of the nearestneighbors of x
Smoothness : it does not vary much within aball of radius small w.r.t. σ (obtained fromsecond derivative); also form of cost ⇒the learned function varies smoothly inregions with no labeled examples
Curse : one needs lots of unlabeled examplesto “fill” the region near the decision surface,and lots of labeled examples to account for all“clusters”
![Page 48: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université](https://reader036.fdocuments.net/reader036/viewer/2022063000/5f0f4e357e708231d44380e8/html5/thumbnails/48.jpg)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
“Almost Real-Life” Example
Need unlabeled exam-ples along the sinus deci-sion surface, and labeledexamples in each classregion.
![Page 49: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université](https://reader036.fdocuments.net/reader036/viewer/2022063000/5f0f4e357e708231d44380e8/html5/thumbnails/49.jpg)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Conclusion
Simple non-parametric setting ⇒ powerful non-parametricsemi-supervised algorithm
Can scale to large datasets thanks to sparsity / subsetselection
Interesting links with electric networks / heat diffusion
Limitations of local weights: curse of dimensionality
Makes it possible to be on time for lunch!!!
![Page 50: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université](https://reader036.fdocuments.net/reader036/viewer/2022063000/5f0f4e357e708231d44380e8/html5/thumbnails/50.jpg)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Conclusion
Simple non-parametric setting ⇒ powerful non-parametricsemi-supervised algorithm
Can scale to large datasets thanks to sparsity / subsetselection
Interesting links with electric networks / heat diffusion
Limitations of local weights: curse of dimensionality
Makes it possible to be on time for lunch!!!
![Page 51: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université](https://reader036.fdocuments.net/reader036/viewer/2022063000/5f0f4e357e708231d44380e8/html5/thumbnails/51.jpg)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Conclusion
Simple non-parametric setting ⇒ powerful non-parametricsemi-supervised algorithm
Can scale to large datasets thanks to sparsity / subsetselection
Interesting links with electric networks / heat diffusion
Limitations of local weights: curse of dimensionality
Makes it possible to be on time for lunch!!!
![Page 52: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université](https://reader036.fdocuments.net/reader036/viewer/2022063000/5f0f4e357e708231d44380e8/html5/thumbnails/52.jpg)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Conclusion
Simple non-parametric setting ⇒ powerful non-parametricsemi-supervised algorithm
Can scale to large datasets thanks to sparsity / subsetselection
Interesting links with electric networks / heat diffusion
Limitations of local weights: curse of dimensionality
Makes it possible to be on time for lunch!!!
![Page 53: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université](https://reader036.fdocuments.net/reader036/viewer/2022063000/5f0f4e357e708231d44380e8/html5/thumbnails/53.jpg)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Conclusion
Simple non-parametric setting ⇒ powerful non-parametricsemi-supervised algorithm
Can scale to large datasets thanks to sparsity / subsetselection
Interesting links with electric networks / heat diffusion
Limitations of local weights: curse of dimensionality
Makes it possible to be on time for lunch!!!
![Page 54: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université](https://reader036.fdocuments.net/reader036/viewer/2022063000/5f0f4e357e708231d44380e8/html5/thumbnails/54.jpg)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Conclusion
Simple non-parametric setting ⇒ powerful non-parametricsemi-supervised algorithm
Can scale to large datasets thanks to sparsity / subsetselection
Interesting links with electric networks / heat diffusion
Limitations of local weights: curse of dimensionality
Makes it possible to be on time for lunch!!!
![Page 55: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université](https://reader036.fdocuments.net/reader036/viewer/2022063000/5f0f4e357e708231d44380e8/html5/thumbnails/55.jpg)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
References
Belkin, M., Matveeva, I., and Niyogi, P. (2004).Regularization and semi-supervised learning on large graphs.In Shawe-Taylor, J. and Singer, Y., editors, COLT’2004. Springer.
Delalleau, O., Bengio, Y., and Le Roux, N. (2005).Efficient non-parametric function induction in semi-supervised learning.In Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics.
Doyle, P. G. and Snell, J. L. (1984).Random walks and electric networks.Mathematical Association of America.
Zhou, D., Bousquet, O., Navin Lal, T., Weston, J., and Schölkopf, B. (2004).Learning with local and global consistency.In Thrun, S., Saul, L., and Schölkopf, B., editors, Advances in Neural Information Processing Systems 16,Cambridge, MA. MIT Press.
Zhu, X., Ghahramani, Z., and Lafferty, J. (2003).Semi-supervised learning using Gaussian fields and harmonic functions.In ICML’2003.