Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai...
-
Upload
alexandra-cantrell -
Category
Documents
-
view
213 -
download
0
Transcript of Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai...
![Page 1: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1.](https://reader036.fdocuments.net/reader036/viewer/2022070305/55147520550346494e8b6272/html5/thumbnails/1.jpg)
Agnostically Learning Agnostically Learning Decision TreesDecision Trees
Parikshit GopalanParikshit Gopalan MSR-Silicon ValleyMSR-Silicon Valley, , IITB’00.IITB’00.
Adam Tauman Kalai Adam Tauman Kalai MSR-New EnglandMSR-New EnglandAdam R. KlivansAdam R. Klivans UT AustinUT Austin
0 1
0
0 1
1
1 0 X1
X2 X3
0 01 1
0
1 00
1
1
![Page 2: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1.](https://reader036.fdocuments.net/reader036/viewer/2022070305/55147520550346494e8b6272/html5/thumbnails/2.jpg)
Computational LearningComputational Learning
![Page 3: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1.](https://reader036.fdocuments.net/reader036/viewer/2022070305/55147520550346494e8b6272/html5/thumbnails/3.jpg)
Computational LearningComputational Learning
![Page 4: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1.](https://reader036.fdocuments.net/reader036/viewer/2022070305/55147520550346494e8b6272/html5/thumbnails/4.jpg)
Computational LearningComputational Learning
Learning: Predict f from examples.
x, f(x)
f:{0,1}n ! {0,1}
![Page 5: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1.](https://reader036.fdocuments.net/reader036/viewer/2022070305/55147520550346494e8b6272/html5/thumbnails/5.jpg)
Valiant’s ModelValiant’s Model
x, f(x)
f:{0,1}n ! {0,1}
Assumption: f comes from a nice concept class.
Halfspaces:
+-
++
+
+
+ +
+ -
-
-
--
--
-
--
![Page 6: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1.](https://reader036.fdocuments.net/reader036/viewer/2022070305/55147520550346494e8b6272/html5/thumbnails/6.jpg)
Valiant’s ModelValiant’s Model
x, f(x)
f:{0,1}n ! {0,1}
Assumption: f comes from a nice concept class.
Decision Trees:
X1
X2 X3
0 01 1
0
1 00
1
1
![Page 7: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1.](https://reader036.fdocuments.net/reader036/viewer/2022070305/55147520550346494e8b6272/html5/thumbnails/7.jpg)
The Agnostic Model The Agnostic Model [Kearns-Schapire-[Kearns-Schapire-
Sellie’94]Sellie’94]
x, f(x)
f:{0,1}n ! {0,1}
No assumptions about f.
Learner should do as well as best decision tree.
Decision Trees:
X2 X3
0 01 1
0
1 00
1
1
X1
![Page 8: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1.](https://reader036.fdocuments.net/reader036/viewer/2022070305/55147520550346494e8b6272/html5/thumbnails/8.jpg)
The Agnostic Model The Agnostic Model [Kearns-Schapire-[Kearns-Schapire-
Sellie’94]Sellie’94]
x, f(x)
No assumptions about f.
Learner should do as well as best decision tree.
Decision Trees:
X2 X3
0 01 1
0
1 00
1
1
X1
![Page 9: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1.](https://reader036.fdocuments.net/reader036/viewer/2022070305/55147520550346494e8b6272/html5/thumbnails/9.jpg)
Agnostic Model = Noisy Agnostic Model = Noisy LearningLearning
f:{0,1}n ! {0,1}
+ =
Concept: Message Truth table: Encoding Function f: Received word.
Coding: Recover the Message.
Learning: Predict f.
X2 X3
0 01 1
0
1 00
1
1
X1
![Page 10: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1.](https://reader036.fdocuments.net/reader036/viewer/2022070305/55147520550346494e8b6272/html5/thumbnails/10.jpg)
Uniform Distribution Uniform Distribution Learning for Decision Learning for Decision
TreesTreesNoiseless Setting:
– No queries: nlog n [Ehrenfeucht-Haussler’89].– With queries: poly(n). [Kushilevitz-Mansour’91]
Reconstruction for sparse real polynomials in the l1 norm.
Agnostic Setting:
Polynomial time, uses queries. [G.-Kalai-Klivans’08]
![Page 11: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1.](https://reader036.fdocuments.net/reader036/viewer/2022070305/55147520550346494e8b6272/html5/thumbnails/11.jpg)
The Fourier Transform The Fourier Transform MethodMethod
Powerful tool for uniform distribution Powerful tool for uniform distribution learning.learning.
Introduced by Introduced by Linial-Mansour-NisanLinial-Mansour-Nisan..– Small depth circuitsSmall depth circuits [Linial-Mansour-Nisan’89][Linial-Mansour-Nisan’89]– DNFsDNFs [Jackson’95][Jackson’95]– Decision treesDecision trees [Kushilevitz-Mansour’94, [Kushilevitz-Mansour’94,
O’Donnell-Servedio’06, G.-Kalai-Klivans’08]O’Donnell-Servedio’06, G.-Kalai-Klivans’08]– Halfspaces, IntersectionsHalfspaces, Intersections [Klivans-O’Donnell-[Klivans-O’Donnell-
Servedio’03, Kalai-Klivans-Mansour-Servedio’05]Servedio’03, Kalai-Klivans-Mansour-Servedio’05]– JuntasJuntas [Mossel-O’Donnell-Servedio’03][Mossel-O’Donnell-Servedio’03]– ParitiesParities [Feldman-G.-Khot-Ponnsuswami’06] [Feldman-G.-Khot-Ponnsuswami’06]
![Page 12: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1.](https://reader036.fdocuments.net/reader036/viewer/2022070305/55147520550346494e8b6272/html5/thumbnails/12.jpg)
The Fourier PolynomialThe Fourier Polynomial Let Let f:{-1,1}f:{-1,1}nn !! {-1,1} {-1,1}. . Write Write ff as a polynomial. as a polynomial.
– AND:AND: ½ + ½X½ + ½X11 + ½X + ½X22 - ½X - ½X11XX22
– Parity:Parity: XX11XX22
Parity of Parity of ½½ [n] [n]: : (x) = (x) = i i 22 XXii
Write Write f(x) = f(x) = c( c())(x)(x)
– c(c()) =1. =1.Standard Basis
Function f
Parities
![Page 13: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1.](https://reader036.fdocuments.net/reader036/viewer/2022070305/55147520550346494e8b6272/html5/thumbnails/13.jpg)
The Fourier PolynomialThe Fourier Polynomial
c(c()): :
Weight of Weight of ..
Let Let f:{-1,1}f:{-1,1}nn !! {-1,1} {-1,1}. . Write Write ff as a polynomial. as a polynomial.
– AND:AND: ½ + ½X½ + ½X11 + ½X + ½X22 - ½X - ½X11XX22
– Parity:Parity: XX11XX22
Parity of Parity of ½½ [n] [n]: : (x) = (x) = i i 22 XXii
Write Write f(x) = f(x) = c( c())(x)(x)
– c(c()) =1. =1.
![Page 14: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1.](https://reader036.fdocuments.net/reader036/viewer/2022070305/55147520550346494e8b6272/html5/thumbnails/14.jpg)
Low Degree FunctionsLow Degree Functions
Sparse Functions: Sparse Functions: Most of the Most of the weight lies on small subsets.weight lies on small subsets. Halfspaces, Small-depth Halfspaces, Small-depth circuits.circuits. Low-degree algorithm. Low-degree algorithm. [Linial-Mansour-Nisan][Linial-Mansour-Nisan] Finds the low-degree Finds the low-degree Fourier coefficients.Fourier coefficients.
Least Squares Regression: Find low-degree P minimizing Ex[ |P(x) – f(x)|2 ].
![Page 15: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1.](https://reader036.fdocuments.net/reader036/viewer/2022070305/55147520550346494e8b6272/html5/thumbnails/15.jpg)
Sparse FunctionsSparse FunctionsSparse Functions: Sparse Functions: Most of the Most of the weight lies on a few subsets.weight lies on a few subsets.
Decision trees.Decision trees.tt leaves leaves )) O(t)O(t) subsets subsets
Sparse Algorithm. Sparse Algorithm. [Kushilevitz-Mansour’91] [Kushilevitz-Mansour’91]
Sparse l2 Regression:
Find t-sparse P minimizing Ex[ |P(x) – f(x)|2 ].
![Page 16: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1.](https://reader036.fdocuments.net/reader036/viewer/2022070305/55147520550346494e8b6272/html5/thumbnails/16.jpg)
Sparse Sparse l2 Regression RegressionSparse Functions: Sparse Functions: Most of the Most of the weight lies on a few subsets.weight lies on a few subsets.
Decision trees.Decision trees.tt leaves leaves )) O(t)O(t) subsets subsets
Sparse Algorithm. Sparse Algorithm. [Kushilevitz-Mansour’91][Kushilevitz-Mansour’91]
Sparse l2 Regression:
Find t-sparse P minimizing Ex[ |P(x) – f(x)|2 ].Finding large coefficients: Hadamard decoding.[Kushilevitz-Mansour’91, Goldreich-Levin’89]
![Page 17: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1.](https://reader036.fdocuments.net/reader036/viewer/2022070305/55147520550346494e8b6272/html5/thumbnails/17.jpg)
Agnostic Learning via Agnostic Learning via l2 Regression?Regression?
-1
+1
f:{-1,1}n ! {-1,1}
![Page 18: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1.](https://reader036.fdocuments.net/reader036/viewer/2022070305/55147520550346494e8b6272/html5/thumbnails/18.jpg)
Agnostic Learning via Agnostic Learning via l2 Regression?Regression?
-1
+1
X2 X3
0 01 1
0
1 00
1
1
X1
![Page 19: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1.](https://reader036.fdocuments.net/reader036/viewer/2022070305/55147520550346494e8b6272/html5/thumbnails/19.jpg)
Agnostic Learning via Agnostic Learning via l2 Regression?Regression?
-1
+1
l2 Regression:
Loss |P(x) –f(x)|2
Pay 1 for indecision.
Pay 4 for a mistake.
l1 Regression: [KKMS’05]
Loss |P(x) –f(x)|
Pay 1 for indecision.
Pay 2 for a mistake.
Target f
Best Tree
![Page 20: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1.](https://reader036.fdocuments.net/reader036/viewer/2022070305/55147520550346494e8b6272/html5/thumbnails/20.jpg)
-1
+1
l2 Regression:
Loss |P(x) –f(x)|2
Pay 1 for indecision.
Pay 4 for a mistake.
l1 Regression: [KKMS’05]
Loss |P(x) –f(x)|
Pay 1 for indecision.
Pay 2 for a mistake.
Agnostic Learning via Agnostic Learning via l1 Regression?Regression?
![Page 21: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1.](https://reader036.fdocuments.net/reader036/viewer/2022070305/55147520550346494e8b6272/html5/thumbnails/21.jpg)
-1
+1
Agnostic Learning via Agnostic Learning via l1 RegressionRegression
Thm [KKMS’05]: l1 Regression always gives a good predictor.
l1 regression for low degree polynomials via Linear Programming.
Target f
Best Tree
![Page 22: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1.](https://reader036.fdocuments.net/reader036/viewer/2022070305/55147520550346494e8b6272/html5/thumbnails/22.jpg)
Sparse l1 Regression: Find a t-sparse polynomial P minimizing Ex[ |P(x) – f(x)| ].
Why is this Harder:
l2 is basis independent, l1 is not.
Don’t know the support of P.
Agnostically Learning Agnostically Learning Decision TreesDecision Trees
[G.-Kalai-Klivans][G.-Kalai-Klivans]: : Polynomial time algorithm Polynomial time algorithm for for Sparse Sparse l1 RegressionRegression. .
![Page 23: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1.](https://reader036.fdocuments.net/reader036/viewer/2022070305/55147520550346494e8b6272/html5/thumbnails/23.jpg)
The Gradient-Projection The Gradient-Projection MethodMethod
Variables: c()’s.
Constraint: |c() | · t
Minimize: Ex|P(x) – f(x)|
P(x) = c() (x)
f(x)
Q(x) = d() (x)
L1(P,Q) = |c() – d()|
L2(P,Q) = [ (c() –d())2]1/2
![Page 24: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1.](https://reader036.fdocuments.net/reader036/viewer/2022070305/55147520550346494e8b6272/html5/thumbnails/24.jpg)
The Gradient-Projection The Gradient-Projection MethodMethod
Variables: c()’s.
Constraint: |c() | · t
Minimize: Ex|P(x) – f(x)|
Gradient
![Page 25: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1.](https://reader036.fdocuments.net/reader036/viewer/2022070305/55147520550346494e8b6272/html5/thumbnails/25.jpg)
Variables: c()’s.
Constraint: |c() | · t
Minimize: Ex|P(x) – f(x)|
Gradient
Projection
The Gradient-Projection The Gradient-Projection MethodMethod
![Page 26: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1.](https://reader036.fdocuments.net/reader036/viewer/2022070305/55147520550346494e8b6272/html5/thumbnails/26.jpg)
The GradientThe Gradient
g(x) = sgn[f(x) - P(x)]g(x) = sgn[f(x) - P(x)]
P(x) := P(x) + P(x) := P(x) + g(x). g(x).
Increase P(x) if low.Decrease P(x) if
high.
-1
+1
f(x)
P(x)
![Page 27: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1.](https://reader036.fdocuments.net/reader036/viewer/2022070305/55147520550346494e8b6272/html5/thumbnails/27.jpg)
The Gradient-Projection The Gradient-Projection MethodMethod
Variables: c()’s.
Constraint: |c() | · t
Minimize: Ex|P(x) – f(x)|
Gradient
![Page 28: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1.](https://reader036.fdocuments.net/reader036/viewer/2022070305/55147520550346494e8b6272/html5/thumbnails/28.jpg)
Variables: c()’s.
Constraint: |c() | · t
Minimize: Ex|P(x) – f(x)|
Gradient
Projection
The Gradient-Projection The Gradient-Projection MethodMethod
![Page 29: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1.](https://reader036.fdocuments.net/reader036/viewer/2022070305/55147520550346494e8b6272/html5/thumbnails/29.jpg)
0
0.05
0.1
0.15
0.2
0.25
0.3
Fourier Spectrum of P
P
Projection onto the LProjection onto the L11 ball ball
Currently: |c()| > t
Want: |c()| · t.
![Page 30: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1.](https://reader036.fdocuments.net/reader036/viewer/2022070305/55147520550346494e8b6272/html5/thumbnails/30.jpg)
0
0.05
0.1
0.15
0.2
0.25
0.3
Fourier Spectrum of P
P
Projection onto the LProjection onto the L11 ball ball
Currently: |c()| > t
Want: |c()| · t.
![Page 31: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1.](https://reader036.fdocuments.net/reader036/viewer/2022070305/55147520550346494e8b6272/html5/thumbnails/31.jpg)
0
0.05
0.1
0.15
0.2
0.25
0.3
Fourier Spectrum of P
P
Projection onto the LProjection onto the L11 ball ball
Below cutoff: Set to 0.
Above cutoff: Subtract.
![Page 32: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1.](https://reader036.fdocuments.net/reader036/viewer/2022070305/55147520550346494e8b6272/html5/thumbnails/32.jpg)
Projection onto the LProjection onto the L11 ball ball
0
0.05
0.1
0.15
0.2
0.25
0.3
Fourier Spectrum of Proj(P)
P
Proj(P)
Below cutoff: Set to 0.
Above cutoff: Subtract.
![Page 33: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1.](https://reader036.fdocuments.net/reader036/viewer/2022070305/55147520550346494e8b6272/html5/thumbnails/33.jpg)
Analysis of Gradient-Projection Analysis of Gradient-Projection [Zinkevich’03][Zinkevich’03]
Progress measure: Squared L2 distance from optimum P*.
Key Equation:
|Pt – P*|2 - |Pt+1 – P*|2 ¸ 2 (L(Pt) – L(P*))
Within of optimal in 1/2 iterations.
Good L2 approximation to Pt suffices.
– 2
How suboptimal current soln is.
Progress made in this step.
![Page 34: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1.](https://reader036.fdocuments.net/reader036/viewer/2022070305/55147520550346494e8b6272/html5/thumbnails/34.jpg)
-1
+1
f(x)
P(x)
0
0.05
0.1
0.15
0.2
0.25
0.3
Fourier Spectrum of P
P
GradientGradient
ProjectionProjection
g(x) = sgn[f(x) - P(x)].g(x) = sgn[f(x) - P(x)].
![Page 35: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1.](https://reader036.fdocuments.net/reader036/viewer/2022070305/55147520550346494e8b6272/html5/thumbnails/35.jpg)
The GradientThe Gradient
g(x) = sgn[f(x) - P(x)].g(x) = sgn[f(x) - P(x)]. -1
+1
f(x)
P(x)
Compute sparse approximation g’ = KM(g).
Is g a good L2 approximation to g’?
No. Initially g = f.
L2(g,g’) can be as large 1.
![Page 36: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1.](https://reader036.fdocuments.net/reader036/viewer/2022070305/55147520550346494e8b6272/html5/thumbnails/36.jpg)
Variables: c()’s.
Constraint: |c() | · t
Minimize: Ex|P(x) – f(x)|
Approximate Gradient
Sparse Sparse l1 Regression Regression
![Page 37: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1.](https://reader036.fdocuments.net/reader036/viewer/2022070305/55147520550346494e8b6272/html5/thumbnails/37.jpg)
Variables: c()’s.
Constraint: |c() | · t
Minimize: Ex|P(x) – f(x)|
Projection Compensat
es
Sparse Sparse l1 Regression Regression
![Page 38: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1.](https://reader036.fdocuments.net/reader036/viewer/2022070305/55147520550346494e8b6272/html5/thumbnails/38.jpg)
KM as KM as l2 Approximation Approximation
The KM Algorithm:
Input: g:{-1,1}n ! {-1,1}, and t.
Output: A t-sparse polynomial g’ minimizing
Ex [|g(x) – g’(x)|2]
Run Time: poly(n,t).
![Page 39: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1.](https://reader036.fdocuments.net/reader036/viewer/2022070305/55147520550346494e8b6272/html5/thumbnails/39.jpg)
KM as LKM as L11 Approximation Approximation
The KM Algorithm:
Input: A Boolean function g = c()(x).
A error bound
Output: Approximation g’ = c’()(x) s.t
|c() – c’()| · for all ½ [n].
Run Time: poly(n,1/)
![Page 40: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1.](https://reader036.fdocuments.net/reader036/viewer/2022070305/55147520550346494e8b6272/html5/thumbnails/40.jpg)
0
0.05
0.1
0.15
0.2
0.25
0.3
g
g' = KM(g)
KM as LKM as L11 Approximation Approximation
1)Identify coefficients larger than .
2) Estimate via sampling, set rest to 0.
Only 1/2
![Page 41: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1.](https://reader036.fdocuments.net/reader036/viewer/2022070305/55147520550346494e8b6272/html5/thumbnails/41.jpg)
0
0.05
0.1
0.15
0.2
0.25
0.3
g
g' = KM(g)
KM as LKM as L11 Approximation Approximation
1)Identify coefficients larger than .
2) Estimate via sampling, set rest to 0.
![Page 42: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1.](https://reader036.fdocuments.net/reader036/viewer/2022070305/55147520550346494e8b6272/html5/thumbnails/42.jpg)
0
0.05
0.1
0.15
0.2
0.25
0.3
P + g
P + g'
Projection Preserves LProjection Preserves L11 DistanceDistance
L1 distance at most 2 after projection.
Both lines stop within of each other.
![Page 43: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1.](https://reader036.fdocuments.net/reader036/viewer/2022070305/55147520550346494e8b6272/html5/thumbnails/43.jpg)
0
0.05
0.1
0.15
0.2
0.25
0.3
P + g
P + g'
Projection Preserves LProjection Preserves L11 DistanceDistance
L1 distance at most 2 after projection.
Both lines stop within of each other.
Else, Blue dominates Red.
![Page 44: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1.](https://reader036.fdocuments.net/reader036/viewer/2022070305/55147520550346494e8b6272/html5/thumbnails/44.jpg)
0
0.05
0.1
0.15
0.2
0.25
0.3
P + g
P + g'
Projection Preserves LProjection Preserves L11 DistanceDistance
L1 distance at most 2 after projection.
Projecting onto the L1 ball does not increase L1 distance.
![Page 45: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1.](https://reader036.fdocuments.net/reader036/viewer/2022070305/55147520550346494e8b6272/html5/thumbnails/45.jpg)
Sparse Sparse l1 Regression Regression
Variables: c()’s.
Constraint: |c() | · t
Minimize: Ex|P(x) – f(x)|
• L1(P, P’) · 2
• L1(P, P’) · 2t
• L2(P, P’)2 · 4t
PP’
Can take = 1/t2.
![Page 46: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1.](https://reader036.fdocuments.net/reader036/viewer/2022070305/55147520550346494e8b6272/html5/thumbnails/46.jpg)
Sparse L1 Regression: Find a sparse polynomial P minimizing Ex[ |P(x) – f(x)| ].
[G.-Kalai-Klivans’08]:[G.-Kalai-Klivans’08]: Can get within Can get within of optimum in of optimum in poly(t,1/poly(t,1/)) iterations.iterations. Algorithm for Algorithm for SparseSparse ll11 RegressionRegression. .
First polynomial time algorithm for First polynomial time algorithm for Agnostically Learning Sparse Polynomials.Agnostically Learning Sparse Polynomials.
Agnostically Learning Agnostically Learning Decision TreesDecision Trees
![Page 47: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1.](https://reader036.fdocuments.net/reader036/viewer/2022070305/55147520550346494e8b6272/html5/thumbnails/47.jpg)
Function f: D ! [-1,1], Orthonormal Basis B.
Sparse l2 Regression: Find a t-sparse polynomial P minimizing Ex[ |P(x) – f(x)|2 ].
Sparse l1 Regression: Find a t-sparse polynomial P minimizing Ex[ |P(x) – f(x)| ].
[G.-Kalai-Klivans’08]:[G.-Kalai-Klivans’08]: Given solution to Given solution to l2
Regression, can solve , can solve l1 Regression. Regression.
l1 Regression from Regression from l2
RegressionRegression
![Page 48: Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin 01 0 0 1.](https://reader036.fdocuments.net/reader036/viewer/2022070305/55147520550346494e8b6272/html5/thumbnails/48.jpg)
Problem: Can we agnostically learn DNFs in polynomial time? (uniform dist. with queries)
Noiseless Setting: Jackson’s Harmonic Sieve.
Implies weak learner for depth-3 circuits.
Beyond current Fourier techniques.
Agnostically Learning Agnostically Learning DNFs?DNFs?