1/15 Agnostically learning halfspaces FOCS 2005. 2/15 Set X, F class of functions f: X! {0,1}....
-
Upload
miguel-doherty -
Category
Documents
-
view
215 -
download
0
Transcript of 1/15 Agnostically learning halfspaces FOCS 2005. 2/15 Set X, F class of functions f: X! {0,1}....
![Page 1: 1/15 Agnostically learning halfspaces FOCS 2005. 2/15 Set X, F class of functions f: X! {0,1}. Efficient Agnostic Learner w.h.p. h: X! {0,1} poly(1/ )](https://reader035.fdocuments.net/reader035/viewer/2022062618/5513f11a55034674748b5c0e/html5/thumbnails/1.jpg)
1/15
Agnostically learning halfspaces
FOCS 2005
![Page 2: 1/15 Agnostically learning halfspaces FOCS 2005. 2/15 Set X, F class of functions f: X! {0,1}. Efficient Agnostic Learner w.h.p. h: X! {0,1} poly(1/ )](https://reader035.fdocuments.net/reader035/viewer/2022062618/5513f11a55034674748b5c0e/html5/thumbnails/2.jpg)
2/15
Set X, F class of functions f: X!{0,1}.
EfficientAgnosticLearner
EfficientAgnosticLearner
w.h.p. h: X!{0,1}
poly(1/) samples
P[f*(x)y]
P [h(x) y] · opt +
L. SellieAgnostic learning
arbitrary arbitrary dist. over (x,y) 2 X £ {0,1}f* = argminf2F P [f(x)y]
![Page 3: 1/15 Agnostically learning halfspaces FOCS 2005. 2/15 Set X, F class of functions f: X! {0,1}. Efficient Agnostic Learner w.h.p. h: X! {0,1} poly(1/ )](https://reader035.fdocuments.net/reader035/viewer/2022062618/5513f11a55034674748b5c0e/html5/thumbnails/3.jpg)
3/15
Set Xnn µ Rnn, Fnn class of functions f: Xnn!{0,1}.
EfficientAgnosticLearner
EfficientAgnosticLearner
w.h.p. h: Xnn!{0,1}
poly(n,n,1/)
samples
Agnostic learning
nn
P[f*(x)y]
P [h(x) y] · opt +
arbitrary arbitrary dist. over (x,y) 2 X £ {0,1}f* = argminf2F P [f(x)y]
L. Sellie
![Page 4: 1/15 Agnostically learning halfspaces FOCS 2005. 2/15 Set X, F class of functions f: X! {0,1}. Efficient Agnostic Learner w.h.p. h: X! {0,1} poly(1/ )](https://reader035.fdocuments.net/reader035/viewer/2022062618/5513f11a55034674748b5c0e/html5/thumbnails/4.jpg)
4/15
arbitrary arbitrary dist. over (x,y) 2 X £ {0,1}f* = argminf2F P [f(x)y]
Set Xnn µ Rnn, Fnn class of functions f: Xnn!{0,1}.
EfficientAgnosticLearner
EfficientAgnosticLearner
w.h.p. h: Xnn!{0,1}
poly(n,n,1/)
samples
Agnostic learning
nn
P[f*(x)y]
P [h(x) y] · opt +
in PAC model, P [f*(x)y] = 0
L. Sellie
![Page 5: 1/15 Agnostically learning halfspaces FOCS 2005. 2/15 Set X, F class of functions f: X! {0,1}. Efficient Agnostic Learner w.h.p. h: X! {0,1} poly(1/ )](https://reader035.fdocuments.net/reader035/viewer/2022062618/5513f11a55034674748b5c0e/html5/thumbnails/5.jpg)
5/15
Fn = { f(x)=I(w¢x¸)| w2Rn, 2R }.
h: Rn!{0,1}
P [h(x) y] · opt +
Agnostic learning of halfspaces
f*
argminf2F
P[f(x)y]
h
P[f*(x)y]
![Page 6: 1/15 Agnostically learning halfspaces FOCS 2005. 2/15 Set X, F class of functions f: X! {0,1}. Efficient Agnostic Learner w.h.p. h: X! {0,1} poly(1/ )](https://reader035.fdocuments.net/reader035/viewer/2022062618/5513f11a55034674748b5c0e/html5/thumbnails/6.jpg)
6/15
Fn = { f(x)=I(w¢x¸)| w2Rn, 2R }.
h: Rn!{0,1}
P [h(x) y] · opt +
Agnostic learning of halfspaces
f*h
Special case: junctions, e.g.,f(x) = x1 Ç x3 = I(x1 + x3 ¸ 1)
Efficient agnostic-learn junctions ) PAC-learn DNF
NP-hard to properly agnostic learn
P[f*(x)y]
![Page 7: 1/15 Agnostically learning halfspaces FOCS 2005. 2/15 Set X, F class of functions f: X! {0,1}. Efficient Agnostic Learner w.h.p. h: X! {0,1} poly(1/ )](https://reader035.fdocuments.net/reader035/viewer/2022062618/5513f11a55034674748b5c0e/html5/thumbnails/7.jpg)
7/15
Fn = { f(x)=I(w¢x¸)| w2Rn, 2R }.
h: Rn!{0,1}
P [h(x) y] · opt +
Agnostic learning of halfspaces
f*
PAC PAC learninglearninghalfspaceshalfspaces
solved by LP solved by LP P[f*(x)y]
![Page 8: 1/15 Agnostically learning halfspaces FOCS 2005. 2/15 Set X, F class of functions f: X! {0,1}. Efficient Agnostic Learner w.h.p. h: X! {0,1} poly(1/ )](https://reader035.fdocuments.net/reader035/viewer/2022062618/5513f11a55034674748b5c0e/html5/thumbnails/8.jpg)
8/15
Fn = { f(x)=I(w¢x¸)| w2Rn, 2R }.
h: Rn!{0,1}
P [h(x) y] · opt +
Agnostic learning of halfspaces
h f*
PAC PAC learninglearninghalfspaces with halfspaces with indep./random indep./random noisenoise
solved by:solved by: P[f*(x)y]
![Page 9: 1/15 Agnostically learning halfspaces FOCS 2005. 2/15 Set X, F class of functions f: X! {0,1}. Efficient Agnostic Learner w.h.p. h: X! {0,1} poly(1/ )](https://reader035.fdocuments.net/reader035/viewer/2022062618/5513f11a55034674748b5c0e/html5/thumbnails/9.jpg)
9/15
Fn = { f(x)=I(w¢x¸)| w2Rn, 2R }.
h: Rn!{0,1}
minf2FnnP[f(x)y]
P [h(x) y] · opt +
Agnostic learning of halfspaces
f*h
Equivalently, f*=“truth” with adversarial noise
![Page 10: 1/15 Agnostically learning halfspaces FOCS 2005. 2/15 Set X, F class of functions f: X! {0,1}. Efficient Agnostic Learner w.h.p. h: X! {0,1} poly(1/ )](https://reader035.fdocuments.net/reader035/viewer/2022062618/5513f11a55034674748b5c0e/html5/thumbnails/10.jpg)
10/15
Theorem 1:
Our alg. outputs h: Rn!{0,1} with P [h(x) y] · opt + ,
in time poly(n) (8 const ),
as long as draws x x 2 2 RRnn from: Log-concave distribution, e.g.:
uniform over convex set, exponential e-|x|, normal
Uniform over {-1,1}n or Sn-1={x2Rn | |x|=1}
…nO(-4)
(w.h.p.)
![Page 11: 1/15 Agnostically learning halfspaces FOCS 2005. 2/15 Set X, F class of functions f: X! {0,1}. Efficient Agnostic Learner w.h.p. h: X! {0,1} poly(1/ )](https://reader035.fdocuments.net/reader035/viewer/2022062618/5513f11a55034674748b5c0e/html5/thumbnails/11.jpg)
11/15
1. L1 polynomial regression algorithm Given: d>0, (x1,y1),…,(xm,ym) 2 Rn £ {0,1}
Find deg-d p(x) to minimize:
Pick 22 [0,1] [0,1] at random, output h(x) = h(x) = II(p(x)(p(x)¸̧))time nO(d)
multivariate¼ minimizedeg(p)·d E [ |p(x)-y| ]
time nO(d)
2. Low-degree Fourier algorithm of Chose ,
where Output h(x) = h(x) = II(p(x)(p(x)¸̧½) ½)
¼ minimizedeg(p)·d E [ (p(x)-y)2 ](requires x uniform from {-1,1}n)
![Page 12: 1/15 Agnostically learning halfspaces FOCS 2005. 2/15 Set X, F class of functions f: X! {0,1}. Efficient Agnostic Learner w.h.p. h: X! {0,1} poly(1/ )](https://reader035.fdocuments.net/reader035/viewer/2022062618/5513f11a55034674748b5c0e/html5/thumbnails/12.jpg)
12/15
time nO(d)
1. L1 polynomial regression algorithm Given: d>0, (x1,y1),…,(xm,ym) 2 Rn £ {0,1}
Find deg-d p(x) to minimize:
Pick 22 [0,1] [0,1] at random, output h(x) = h(x) = II(p(x)(p(x)¸̧))
multivariate
2. Low-degree Fourier algorithm of
¼ minimizedeg(p)·d E [ |p(x)-y| ]
Chose ,
where Output h(x) = h(x) = II(p(x)(p(x)¸̧½) ½)
¼ minimizedeg(p)·d E [ (p(x)-y)2 ](requires x uniform from {-1,1}n)
lemma: alg’s error · opt + mindeg(q)·dE [ |f*(x)-q(x)| ]
lemma: alg’s error·8(opt + mindeg(q)·dE [(f*(x)-q(x))2])
·p
lemma of : alg’s error· ½ - (½ - opt)2 +& Sellie
![Page 13: 1/15 Agnostically learning halfspaces FOCS 2005. 2/15 Set X, F class of functions f: X! {0,1}. Efficient Agnostic Learner w.h.p. h: X! {0,1} poly(1/ )](https://reader035.fdocuments.net/reader035/viewer/2022062618/5513f11a55034674748b5c0e/html5/thumbnails/13.jpg)
13/15
-1
-0.5
0
0.5
1 -1
-0.5
0
0.5
1
0
0.25
0.5
0.75
1
-1
-0.5
0
0.5
1
Approx degree is dimension-free for halfspaces
Useful properties of logconcave dist’s: projection is logconcave, …,
-1 -0.5 0.5 1
0.2
0.4
0.6
0.8
1
q(x) ¼ I(x ¸ 0)degree d=10
q(w¢x) ¼ I(w¢x¸0)degree d=10
![Page 14: 1/15 Agnostically learning halfspaces FOCS 2005. 2/15 Set X, F class of functions f: X! {0,1}. Efficient Agnostic Learner w.h.p. h: X! {0,1} poly(1/ )](https://reader035.fdocuments.net/reader035/viewer/2022062618/5513f11a55034674748b5c0e/html5/thumbnails/14.jpg)
14/15
-1 -0.5 0.5 1
0.2
0.4
0.6
0.8
1
Approximating I(x ¸ ) (1 dimension)
Bound mindeg(q)·dE[(q(x) – I(x ¸ ))2] Continuous distributions: orthogonal polynomials
Normal: Hermite polynomials Logconcave (e-|x|/2 suffices): new polynomials Uniform on sphere: Gegenbauer polynomials
Uniform on hypercube: Fourier
<f,g> = E [f(x)g(x)]
Hey, I’ve usedHermite (pronounced air-meet) polynomials
many times.
![Page 15: 1/15 Agnostically learning halfspaces FOCS 2005. 2/15 Set X, F class of functions f: X! {0,1}. Efficient Agnostic Learner w.h.p. h: X! {0,1} poly(1/ )](https://reader035.fdocuments.net/reader035/viewer/2022062618/5513f11a55034674748b5c0e/html5/thumbnails/15.jpg)
15/15
Theorem 2: junctions (e.g., x1 Æ x11 Æ x17) For arbitrary over {0,1}n£{0,1} the polynomial regression algorithm with d=O(n1/2log(1/)) (time -O*(n½)) outputs h with P[h(x)y] · opt +
Follows from previous lemmas +
![Page 16: 1/15 Agnostically learning halfspaces FOCS 2005. 2/15 Set X, F class of functions f: X! {0,1}. Efficient Agnostic Learner w.h.p. h: X! {0,1} poly(1/ )](https://reader035.fdocuments.net/reader035/viewer/2022062618/5513f11a55034674748b5c0e/html5/thumbnails/16.jpg)
16/15
How far can we get in poly(n,1/) time?
Assume draws x uniform from: Sn-1 = { x2Rn | |x|=1} Perceptron algorithm: error · O(pn) opt +
We show: simple averaging algorithm of achieves error · O(log(1/opt)) opt +
Assume (x,y) = (1-) (x,f*(x)) + (arbitrary (x,y)):
We get: error · O(n1/4 log(n/)) + using Rankin’s second bound
uniform 2 Sn-1
![Page 17: 1/15 Agnostically learning halfspaces FOCS 2005. 2/15 Set X, F class of functions f: X! {0,1}. Efficient Agnostic Learner w.h.p. h: X! {0,1} poly(1/ )](https://reader035.fdocuments.net/reader035/viewer/2022062618/5513f11a55034674748b5c0e/html5/thumbnails/17.jpg)
17/15
Half-space conclusions & future work
L1 poly reg: natural extension of Fourier learning Works for non-uniform/arbitrary distributions Tolerates agnostic noise Works on both continuous and discrete problems
Future work Work on all distributions
(not just logconcave/uniform {-1,1}n) opt + using poly(n,1/) algorithm
(we have poly(n) for fixed , and trivial: poly() for fixed n) Other interesting classes of functions