Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis...

69
Introduction to Machine Learning VapnikChervonenkis Theory Barnabás Póczos

Transcript of Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis...

Page 1: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

Introduction to Machine Learning

Vapnik–Chervonenkis Theory

Barnabás Póczos

Page 2: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

Empirical Risk and True Risk

2

Page 3: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

Empirical Risk

3

Let us use the empirical counter part:

Shorthand:

True risk of f (deterministic):

Bayes risk:

Empirical risk:

Page 4: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

Empirical Risk Minimization

4

Law of Large Numbers:

Empirical risk is converging to the true risk

Page 5: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

Overfitting in Classification with ERM

5

Bayes classifier:

Picture from David Pal

Bayes risk:

Generative model:

Page 6: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

6

Picture from David Pal

Bayes risk:

n-order thresholded polynomials

Empirical risk:

Overfitting in Classification with ERM

Page 7: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.5

1

1.5

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.2

0.4

0.6

0.8

1

1.2

1.4

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-45

-40

-35

-30

-25

-20

-15

-10

-5

0

5

k=1 k=2

k=3 k=7

If we allow very complicated predictors, we could overfit

the training data.

Examples: Regression (Polynomial of order k-1 – degree k )

Overfitting in Regression

7

constant linear

quadratic6th order

Page 8: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

Solutions to Overfitting

8

Page 9: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

Solutions to Overfitting

Structural Risk Minimization

Notation:

Goal:(Model error, Approximation error)

Solution: Structural Risk Minimzation (SRM)

9

Risk Empirical risk

Page 10: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

Big Picture

10

Bayes risk

Estimation error Approximation error

Bayes risk

Ultimate goal:

Approximation error

Estimation error

Bayes risk

Page 11: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

Empirical risk is no longer a

good indicator of true risk

fixed # training data

If we allow very complicated predictors, we could overfit

the training data.

Effect of Model Complexity

11

Prediction error on training data

Page 12: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

Classification

using the 0-1 loss

12

Page 13: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

The Bayes Classifier

13

Lemma I:

Lemma II:

Proofs: Lemma I: Trivial from definition

Lemma II: Surprisingly long calculation

Page 14: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

The Bayes Classifier

14

This is what the learning algorithm produces

We will need these definitions, please copy it!

Page 15: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

The Bayes Classifier

15

Theorem I: Bound on the Estimation error

The true risk of what the learning algorithm produces

This is what the learning algorithm produces

Page 16: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

Theorem I: Bound on the Estimation error

The true risk of what the learning algorithm produces

Proof of Theorem 1

Proof:

Page 17: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

The Bayes Classifier

17

Theorem II: This is what the learning algorithm produces

Proof: Trivial

Page 18: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

Corollary

18

Main message:

It’s enough to derive upper bounds for

Corollary:

Page 19: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

Illustration of the Risks

19

Page 20: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

It is a random variable that we need to bound!

We will bound it with tail bounds!

20

It’s enough to derive upper bounds for

Page 21: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

Hoeffding’s inequality (1963)

21

Special case

Page 22: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

Binomial distributions

22

Our goal is to bound

Bernoulli(p)

Therefore, from Hoeffding we have:

Yuppie!!!

Page 23: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

Inversion

23

From Hoeffding we have:

Therefore,

Page 24: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

Union Bound

24

Our goal is to bound:

We already know:

Theorem: [tail bound on the ‘deviation’ in the worst case]

Worst case error

Proof:

This is not the worst classifier in terms of classification accuracy!

Worst case means that the empirical risk of classifier f is the furthest from its true risk!

Page 25: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

Inversion of Union Bound

25

Therefore,

We already know:

Page 26: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

Inversion of Union Bound

26

•The larger the N, the looser the bound

•This results is distribution free: True for all P(X,Y) distributions

• It is useless if N is big, or infinite… (e.g. all possible hyperplanes)

It can be fixed with McDiarmid inequality and VC dimension…

Page 27: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

Concentration and

Expected Value

27

n

Page 28: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

The Expected Error

28

Our goal is to bound:

Theorem: [Expected ‘deviation’ in the worst case]

Worst case deviation

We already know:

Proof: we already know a tail bound.

(Tail bound, Concentration inequality)

(From that actually we get a bit weaker inequality… oh well)

This is not the worst classifier in terms of classification accuracy!

Worst case means that the empirical risk of classifier f is the furthest from its true risk!

Page 29: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

Function classes with infinite

many elements

Page 30: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

McDiarmid’s

Bounded Difference Inequality

It follows that

30

Page 31: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

Bounded Difference Condition

31

Let g denote the following function:

Observation:

Proof:

=> McDiarmid can be applied for g!

Our main goal is to bound

Lemma:

Page 32: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

Bounded Difference

Condition

32

Corollary:

The Vapnik-Chervonenkis inequality does that

with the shatter coefficient (and VC dimension)!

Page 33: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

Vapnik-Chervonenkis inequality

33

We already know:

Vapnik-Chervonenkis inequality:

Corollary: Vapnik-Chervonenkis theorem:

Our main goal is to bound

Page 34: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

Shattering

34

Page 35: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

How many points can a linear

boundary classify exactly in 1D?

-+

2 pts 3 pts- + +

- - +

- + - ??There exists placement s.t.

all labelings can be classified

- +

35

The answer is 2

Page 36: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

- +3 pts 4 pts

- +

+-+

- +

-??

- +

-+

How many points can a linear

boundary classify exactly in 2D?

There exists a placement s.t.

all labelings can be classified

36

The answer is 3No matter how we place 4 points,

there is a labeling that cannot be

classified

Page 37: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

How many points can a linear

boundary classify exactly in d-dim?

37

The answer is d+1

How many points can a linear

boundary classify exactly in 3D?

The answer is 4

tetraeder

+

+ -

-

Page 38: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

Growth function,

Shatter coefficient

Definition

0 0 0

0 1 0

1 1 1

1 0 0

0 1 1

0 1 0

1 1 1

(=5 in this example)

Growth function, Shatter coefficient

maximum number of behaviors on n points

38

Page 39: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

Growth function,

Shatter coefficient

Definition

Growth function, Shatter coefficient

maximum number of behaviors on n points

- +

+

Example: Half spaces in 2D

+ +-

39

Page 40: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

VC-dimension

Definition

Growth function, Shatter coefficient

maximum number of behaviors on n points

Definition: VC-dimension

# b

eha

vio

rs

Definition: Shattering

Note:40

Page 41: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

VC-dimension

Definition

# b

eha

vio

rs

41

Page 42: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

- +

-+

VC-dimension

42

(such that you want to maximize the # of different behaviors)

Page 43: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

Examples

43

Page 44: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

VC dim of decision stumps

(axis aligned linear separator) in 2d

What’s the VC dim. of decision stumps in 2d?

- +

+

- +

-

+ +

-

There is a placement of 3 pts that can be shattered => VC dim ≥ 3

44

Page 45: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

What’s the VC dim. of decision stumps in 2d?If VC dim = 3, then for all placements of 4 pts, there exists a labeling that

can’t be shattered

3 collinear1 in convex

hull of other 3quadrilateral

- + - -+

-

-

+-

+-

VC dim of decision stumps

(axis aligned linear separator) in 2d

45

=> VC dim = 3

Page 46: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

VC dim. of axis parallel

rectangles in 2dWhat’s the VC dim. of axis parallel rectangles in 2d?

- +

+

- +

-There is a placement of 3 pts that can be shattered => VC dim ≥ 3

46

Page 47: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

VC dim. of axis parallel

rectangles in 2d

There is a placement of 4 pts that can be shattered ) VC dim ≥ 447

Page 48: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

VC dim. of axis parallel

rectangles in 2dWhat’s the VC dim. of axis parallel rectangles in 2d?

++

-

-

--

-+ --+

-

--

-+

-

+

-

If VC dim = 4, then for all placements of 5 pts, there exists a labeling

that can’t be shattered

4 collinear

2 in convex hull 1 in convex hull

pentagon

48) VC dim = 4

Page 49: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

Sauer’s Lemma

49

The VC dimension can be used to upper bound

the shattering coefficient.

Sauer’s lemma:

Corollary:

We already know that [Exponential in n]

[Polynomial in n]

Page 50: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

Vapnik-Chervonenkis inequality

50

Vapnik-Chervonenkis inequality:

From Sauer’s lemma:

Since

Therefore,

[We don’t prove this]

Estimation error

Page 51: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

Linear (hyperplane) classifiers

51

Estimation error

We already know that

Estimation error

Estimation error

Page 52: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

Vapnik-Chervonenkis Theorem

52

We already know from McDiarmid:

Corollary: Vapnik-Chervonenkis theorem: [We don’t prove them]

Vapnik-Chervonenkis inequality:

We already know: Hoeffding + Union bound for finite function class:

Page 53: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

PAC Bound for the Estimation

Error

53

VC theorem:

Inversion:

Estimation error

Page 54: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

What you need to know

Complexity of the classifier depends on number of points that can

be classified exactly

Finite case – Number of hypothesis

Infinite case – Shattering coefficient, VC dimension

Page 55: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

Thanks for your attention ☺

55

Page 56: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

Attic

Page 57: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

Proof of Sauer’s Lemma

Write all different behaviors on a sample

(x1,x2,…xn) in a matrix:

0 0 0

0 1 0

1 1 1

1 0 0

0 1 0

1 1 1

0 1 1

57

0 0 0

0 1 0

1 1 1

1 0 0

0 1 1

VC dim =2

Page 58: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

Proof of Sauer’s Lemma

We will prove that

58

0 0 0

0 1 0

1 1 1

1 0 0

0 1 1

Shattered subsets of columns:

Therefore,

In this example: 5· 1+3+3=7, since VC=2, n=3

Page 59: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

Proof of Sauer’s Lemma

Lemma 1

59

0 0 0

0 1 0

1 1 1

1 0 0

0 1 1

for any binary matrix with no repeated rows.

Shattered subsets of columns:

Lemma 2

In this example: 5· 6

In this example: 6· 1+3+3=7

Page 60: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

Proof of Lemma 1

Lemma 1

60

0 0 0

0 1 0

1 1 1

1 0 0

0 1 1

Shattered subsets of columns:

Proof

In this example: 6· 1+3+3=7

Q.E.D.

Page 61: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

Proof of Lemma 2

61

for any binary matrix with no repeated rows.

Lemma 2

Induction on the number of columnsProof:

Base case: A has one column. There are three cases:

=> 1 · 1

=> 1 · 1

=>2 · 2

Page 62: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

Proof of Lemma 2

62

Inductive case: A has at least two columns.

We have,

By induction (less columns)

0 0 0

0 1 0

1 1 1

1 0 0

0 1 1

Page 63: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

Proof of Lemma 2

63

because

0 0 0

0 1 0

1 1 1

1 0 0

0 1 1

Q.E.D.

Page 64: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

Solution to Overfitting

2nd issue:

Solution:

64

Page 65: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

Approximation with the

Hinge loss and quadratic loss

Picture is taken from R. Herbrich

Page 66: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

Underfitting

Bayes risk = 0.166

Page 67: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

Underfitting

Best linear classifier:

The empirical risk of the best linear classifier:

67

Page 68: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

Underfitting

Best quadratic classifier:

Same as the Bayes risk ) good fit!

68

Page 69: Vapnik Chervonenkis Theory10701/slides/19_VC_Theory.pdf · 2017. 11. 13. · Vapnik–Chervonenkis Theory Barnabás Póczos . Empirical Risk and True Risk 2. Empirical Risk 3 Let

Structural Risk Minimization

69

Bayes risk

Estimation error Approximation error

Ultimate goal:

Approximation error

Estimation error

So far we studied when

estimation error ! 0, but we also want approximation error ! 0

Many different variants…

penalize too complex models to avoid overfitting