MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004...

24
MACHİNE LEARNİNG 3. Supervised Learning

description

Training set X Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 3

Transcript of MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004...

Page 1: MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning  The MIT Press (V1.1)

MACHİNE LEARNİNG3. Supervised Learning

Page 2: MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning  The MIT Press (V1.1)

Learning a Class from Examples

Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

2

Class C of a “family car” Prediction: Is car x a family car? Knowledge extraction: What do people expect from

a family car? Output:

Positive (+) and negative (–) examples Input representation:

Expert suggestions x1: price, x2 : engine power Ignore other attributes

Page 3: MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning  The MIT Press (V1.1)

Training set XNt

tt,r 1}{ xX1 if is positive0 if is negative

r

xx

Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)3

2

1

xxx

Page 4: MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning  The MIT Press (V1.1)

Class C

Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

4

•Assume class model (rectangle)

•(p1 ≤ price ≤ p2) & (e1 ≤ engine power ≤ e2)

Page 5: MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning  The MIT Press (V1.1)

S, G, and the Version Space

Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

5

most specific hypothesis, Smost general hypothesis, G

h H, between S and G isconsistent

and make up the version space

(Mitchell, 1997)

Page 6: MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning  The MIT Press (V1.1)

Hypothesis class H

negative as classifies if 0positive as classifies if 1)(

xxx

hh

h

1

( | ) 1N

t t

tE h h r

xX

6

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Error of h on H

Page 7: MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning  The MIT Press (V1.1)

Generalization

Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

7

Problem of generalization: how well our hypothesis will correctly classify future examples

In our example: hypothesis is characterized by 4 numbers (p1,p2,e1,e2)

Choose the best one Include all positive and none negative Infinitely many hypothesis for real-valued

parameters

Page 8: MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning  The MIT Press (V1.1)

Doubt

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

8

In some applications, a wrong decision is very costly

May reject an instance if fall between S (most specific) and G (most general)

Page 9: MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning  The MIT Press (V1.1)

VC Dimension

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

9

Assumed that H (hypothesis space) includes true class C

H should be flexible enough or have enough capacity to include C

Need some measure of hypothesis space “flexibility” complexity

Can try to increase complexity of hypothesis space

Page 10: MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning  The MIT Press (V1.1)

VC Dimension

Based on for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

10

N points can be labeled in 2N ways as +/–

H shatters N if there exists h H consistent for any of these: VC(H ) = N

An axis-aligned rectangle shatters 4 points only !

Page 11: MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning  The MIT Press (V1.1)

Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

11

Fix a probability of target classification error (planned future)

Actual error depends on training sample(past)

Want the actual probability error(actual future) be less than a target with high probability

Probably Approximately Correct (PAC) Learning

Page 12: MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning  The MIT Press (V1.1)

Probably Approximately Correct (PAC) Learning

Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

12

How many training examples N should we have, such that with probability at least 1 ‒ δ, h has error at most ε ?(Blumer et al., 1989)

Let’s calculate how many samples wee need for S Each strip is at most ε/4 Pr that we miss a strip 1‒ ε/4 Pr that N instances miss a strip (1 ‒ ε/4)N

Pr that N instances miss 4 strips 4(1 ‒ ε/4)N

1-4(1 ‒ ε/4)N >1- δ and (1 ‒ x)≤exp( ‒ x) 4exp(‒ εN/4) ≤ δ and N ≥ (4/ε)log(4/δ)

Page 13: MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning  The MIT Press (V1.1)

Noise

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

13

Imprecision in recording the input attributes Error in labeling data points (teacher noise) Additional attributes not taken into account

(hidden or latent) Same price/engine with different label due to a

color Effect of this attributes modeled as a noise

Class boundary might be not simple Need more complicated hypothesis

space/model

Page 14: MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning  The MIT Press (V1.1)

Noise and Model Complexity

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

14

Use the simpler one because Simpler to use

(lower computational complexity)

Easier to train (lower space complexity)

Easier to explain (more interpretable)

Generalizes better (lower variance - Occam’s razor)

Page 15: MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning  The MIT Press (V1.1)

Occam Razor

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

15

If actual class is simple and there is mislabeling or noise, the simpler model will generalized better

Simpler model result in more errors on training set

Will generalized better , won’t try to explain noise in training sample

Simple explanations are more plausible

Page 16: MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning  The MIT Press (V1.1)

Multiple Classes

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

16

General case K classes Family, Sport , Luxury cars

Classes can overlap

Can use different/same hypothesis class

Fall into two classes? Sometimes worth to reject

Page 17: MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning  The MIT Press (V1.1)

Multiple Classes, Ci i=1,...,KNt

tt,r 1}{ xX

, if 0 if 1

ijr

jt

it

ti C

Cxx

, if 0 if 1

ijh

jt

it

ti C

Cxxx

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

17

Train hypotheses hi(x), i =1,...,K:

Page 18: MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning  The MIT Press (V1.1)

Regression

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

18

Output is not Boolean (yes/no) or label but numeric value

Training Set of examples Interpolation: fit function (polynomial) Extrapolation: predict output for any x Regression : added noise Assumption: hidden variables Approximate output by model: g(x)

Page 19: MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning  The MIT Press (V1.1)

Regression

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

19

Empirical error on training set

Hypothesis space is linear functions

Calculate best parameters to minimize error by taking partial derivatives

Page 20: MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning  The MIT Press (V1.1)

Higher-order polynomials

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

20

Page 21: MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning  The MIT Press (V1.1)

Model Selection & Generalization

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

21

Learning is an ill-posed problem; data is not sufficient to find a unique solution Each sample remove irrelevant hypothesis

The need for inductive bias, assumptions about H E.g. rectangles in our example

Generalization: How well a model performs on new data

Overfitting: H more complex than C or f Underfitting: H less complex than C or f

Page 22: MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning  The MIT Press (V1.1)

Triple Trade-Off

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

22

There is a trade-off between three factors (Dietterich, 2003):

1. Complexity of H, c (H),2. Training set size, N, 3. Generalization error, E, on new data

As NE As c (H)first Eand then E

Page 23: MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning  The MIT Press (V1.1)

Cross-Validation

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

23

To estimate generalization error, we need data unseen during training. We split the data as Training set (50%)

To train a model Validation set (25%)

To select a model (e.g. degree of polynomials) Test (publication) set (25%)

Estimate the error Resampling when there is few data

Page 24: MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning  The MIT Press (V1.1)

Dimensions of a Supervised Learner1. Model :

2. Loss function:

3. Optimization procedure:

|xg

t

tt g,rLE || xX

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

24

X|min arg

E*