PAC Learning adapted from Tom M.Mitchell Carnegie Mellon University.

20
PAC Learning adapted from Tom M.Mitchell Carnegie Mellon University

Transcript of PAC Learning adapted from Tom M.Mitchell Carnegie Mellon University.

Page 1: PAC Learning adapted from Tom M.Mitchell Carnegie Mellon University.

PAC Learning

adapted from

Tom M.Mitchell

Carnegie Mellon University

Page 2: PAC Learning adapted from Tom M.Mitchell Carnegie Mellon University.

Learning Issues

Under what conditions is successful learning

… possible ?

… assured for a particular learning algorithm ?

Page 3: PAC Learning adapted from Tom M.Mitchell Carnegie Mellon University.

Sample Complexity

How many training examples are needed

… for a learner to converge (with high probability) to a successful hypothesis?

Page 4: PAC Learning adapted from Tom M.Mitchell Carnegie Mellon University.

Computational Complexity

How much computational effort is needed

… for a learner to converge (with high probability) to a successful hypothesis?

Page 5: PAC Learning adapted from Tom M.Mitchell Carnegie Mellon University.

The world

X is the sample space

Example: Two dice{(1,1),(1,2),…,(6,5),(6,6)}

x x xx

x x x

xx

x

xx

x

Page 6: PAC Learning adapted from Tom M.Mitchell Carnegie Mellon University.

Weighted world

X is a distribution over X

Example: Biased dice{(1,1; p11),(1,2 ; p12),…,(6,5 ; p65),(6,6 ;

p66)}

xx xx

x xx

xx

x

xx

x

Page 7: PAC Learning adapted from Tom M.Mitchell Carnegie Mellon University.

An event

E is a subset of X

Example: Two dice{(1,1),(1,2),…,(6,5),(6,6)}

x x xx

x x x

xx

x

xx

x

Page 8: PAC Learning adapted from Tom M.Mitchell Carnegie Mellon University.

An event

E is a subset of X

Example: A pair in Two dice{(1,1),(2,2),(3,3),(4,4),(5,5),(6,6)}

x x xx

x x x

xx

x

xx

x

Page 9: PAC Learning adapted from Tom M.Mitchell Carnegie Mellon University.

A Concept

C is an indicator function of an event E

Example: A pair in Two dicec(x,y) := (x==y)x x xx

x x x

xx

x

xx

x

Page 10: PAC Learning adapted from Tom M.Mitchell Carnegie Mellon University.

A hypotesis

h is an approximation to a concept c

Example: A separating hyperplane

h(x,y) := (0.5).[1+sign(a.x+by+c)]

x x xx

x x x

xx

x

xx

x

Page 11: PAC Learning adapted from Tom M.Mitchell Carnegie Mellon University.

The dataset

D is an i.i.d. sample from (X, )

{<xi,c(xi)>}i=1,…,m

m examples

Page 12: PAC Learning adapted from Tom M.Mitchell Carnegie Mellon University.

An Inductive learner

L is an algorithm that uses data D to produce hH

Example: The Perceptron Algorithm

h(x,y) := (0.5).[1+sign(a(D).x+b(D).y+c(D))]

x x xx

x x x

xx

x

xx

x

Page 13: PAC Learning adapted from Tom M.Mitchell Carnegie Mellon University.

Error Measures

Training error of hypothesis h

How often over training instances

True error of hypothesis h

How often over future random instances

Page 14: PAC Learning adapted from Tom M.Mitchell Carnegie Mellon University.

True error

Page 15: PAC Learning adapted from Tom M.Mitchell Carnegie Mellon University.

True error

Page 16: PAC Learning adapted from Tom M.Mitchell Carnegie Mellon University.

Learnability

How to describe Learn-ability ?

the number of training examples needed to learn a hypothesis for

which = 0.

Infeasible Infeasible

Page 17: PAC Learning adapted from Tom M.Mitchell Carnegie Mellon University.

PAC Learnability

Weaken demands on the learner

true error accuracy failure probability

and can be arbitrarily small

Probably Approximately Correct Probably Approximately Correct LearningLearning

Page 18: PAC Learning adapted from Tom M.Mitchell Carnegie Mellon University.

PAC Learnability

C is PAC-learnable by L

true error < with probability (1-) after reasonable # of examples reasonable time per example

Reasonable polynomial in terms of 1/, 1/, n(size of

examples) and target concept encoding length

Page 19: PAC Learning adapted from Tom M.Mitchell Carnegie Mellon University.

PAC Learnability

1)(Pr herrorD

Page 20: PAC Learning adapted from Tom M.Mitchell Carnegie Mellon University.

C is PAC-Learnable

each target concept in C can be learned from a polynominal number of training examples

the processing time per example is also polynominal bounded

polynomial in terms of 1/, 1/, n (size of examples) and target c encoding length