Supervised Learning: The Setup Computational Learning ...zhe/pdf/lec-12-pac-conjunctions-upload.pdf1...
Transcript of Supervised Learning: The Setup Computational Learning ...zhe/pdf/lec-12-pac-conjunctions-upload.pdf1...
![Page 1: Supervised Learning: The Setup Computational Learning ...zhe/pdf/lec-12-pac-conjunctions-upload.pdf1 Computational Learning Theory: An Analysis of a Conjunction Learner Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022051905/5ff719d68c26112b767906aa/html5/thumbnails/1.jpg)
1
Computational Learning Theory: An Analysis of a Conjunction Learner
MachineLearningFall2017
SupervisedLearning:TheSetup
1
Machine LearningSpring 2018
The slides are mainly from Vivek Srikumar
![Page 2: Supervised Learning: The Setup Computational Learning ...zhe/pdf/lec-12-pac-conjunctions-upload.pdf1 Computational Learning Theory: An Analysis of a Conjunction Learner Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022051905/5ff719d68c26112b767906aa/html5/thumbnails/2.jpg)
This section
1. Analyze a simple algorithm for learning conjunctions
2. Define the PAC model of learning
3. Make formal connections to the principle of Occam’s razor
2
![Page 3: Supervised Learning: The Setup Computational Learning ...zhe/pdf/lec-12-pac-conjunctions-upload.pdf1 Computational Learning Theory: An Analysis of a Conjunction Learner Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022051905/5ff719d68c26112b767906aa/html5/thumbnails/3.jpg)
Learning Conjunctions
Training data– <(1,1,1,1,1,1,…,1,1), 1>– <(1,1,1,0,0,0,…,0,0), 0>– <(1,1,1,1,1,0,...0,1,1), 1>– <(1,0,1,1,1,0,...0,1,1), 0>– <(1,1,1,1,1,0,...0,0,1), 1>– <(1,0,1,0,0,0,...0,1,1), 0>– <(1,1,1,1,1,1,…,0,1), 1>– <(0,1,0,1,0,0,...0,1,1), 0>
f = x2 ∧ x3∧ x4 ∧ x5∧ x100
3
The true function
![Page 4: Supervised Learning: The Setup Computational Learning ...zhe/pdf/lec-12-pac-conjunctions-upload.pdf1 Computational Learning Theory: An Analysis of a Conjunction Learner Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022051905/5ff719d68c26112b767906aa/html5/thumbnails/4.jpg)
Learning Conjunctions
Training data– <(1,1,1,1,1,1,…,1,1), 1>– <(1,1,1,0,0,0,…,0,0), 0>– <(1,1,1,1,1,0,...0,1,1), 1>– <(1,0,1,1,1,0,...0,1,1), 0>– <(1,1,1,1,1,0,...0,0,1), 1>– <(1,0,1,0,0,0,...0,1,1), 0>– <(1,1,1,1,1,1,…,0,1), 1>– <(0,1,0,1,0,0,...0,1,1), 0>
f = x2 ∧ x3∧ x4 ∧ x5∧ x100
A simple learning algorithm (Elimination)
• Discard all negative examples
4
![Page 5: Supervised Learning: The Setup Computational Learning ...zhe/pdf/lec-12-pac-conjunctions-upload.pdf1 Computational Learning Theory: An Analysis of a Conjunction Learner Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022051905/5ff719d68c26112b767906aa/html5/thumbnails/5.jpg)
Learning Conjunctions
Training data– <(1,1,1,1,1,1,…,1,1), 1>– <(1,1,1,0,0,0,…,0,0), 0>– <(1,1,1,1,1,0,...0,1,1), 1>– <(1,0,1,1,1,0,...0,1,1), 0>– <(1,1,1,1,1,0,...0,0,1), 1>– <(1,0,1,0,0,0,...0,1,1), 0>– <(1,1,1,1,1,1,…,0,1), 1>– <(0,1,0,1,0,0,...0,1,1), 0>
f = x2 ∧ x3∧ x4 ∧ x5∧ x100
h = x1∧ x2 ∧ x3∧ x4 ∧ x5∧ x100
A simple learning algorithm (Elimination)
• Discard all negative examples• Build a conjunction using the features
that are common to all positive conjunctions
5
![Page 6: Supervised Learning: The Setup Computational Learning ...zhe/pdf/lec-12-pac-conjunctions-upload.pdf1 Computational Learning Theory: An Analysis of a Conjunction Learner Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022051905/5ff719d68c26112b767906aa/html5/thumbnails/6.jpg)
Learning Conjunctions
Training data– <(1,1,1,1,1,1,…,1,1), 1>– <(1,1,1,0,0,0,…,0,0), 0>– <(1,1,1,1,1,0,...0,1,1), 1>– <(1,0,1,1,1,0,...0,1,1), 0>– <(1,1,1,1,1,0,...0,0,1), 1>– <(1,0,1,0,0,0,...0,1,1), 0>– <(1,1,1,1,1,1,…,0,1), 1>– <(0,1,0,1,0,0,...0,1,1), 0>
f = x2 ∧ x3∧ x4 ∧ x5∧ x100
h = x1∧ x2 ∧ x3∧ x4 ∧ x5∧ x100
A simple learning algorithm (Elimination)
• Discard all negative examples• Build a conjunction using the features
that are common to all positive conjunctions
6
Positive examples eliminate irrelevant features
![Page 7: Supervised Learning: The Setup Computational Learning ...zhe/pdf/lec-12-pac-conjunctions-upload.pdf1 Computational Learning Theory: An Analysis of a Conjunction Learner Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022051905/5ff719d68c26112b767906aa/html5/thumbnails/7.jpg)
Learning Conjunctions
Training data– <(1,1,1,1,1,1,…,1,1), 1>
– <(1,1,1,0,0,0,…,0,0), 0>
– <(1,1,1,1,1,0,...0,1,1), 1>
– <(1,0,1,1,1,0,...0,1,1), 0>
– <(1,1,1,1,1,0,...0,0,1), 1>
– <(1,0,1,0,0,0,...0,1,1), 0>
– <(1,1,1,1,1,1,…,0,1), 1>
– <(0,1,0,1,0,0,...0,1,1), 0>
f = x2 ∧ x3∧ x4 ∧ x5∧ x100
h = x1∧ x2 ∧ x3∧ x4 ∧ x5∧ x100
A simple learning algorithm:
• Discard all negative examples• Build a conjunction using the features
that are common to all positive conjunctions
7
Clearly this algorithm produces a conjunction that is consistent with the data, that is errS(h) = 0, if the target function is a monotone conjunctionExercise: Why?
![Page 8: Supervised Learning: The Setup Computational Learning ...zhe/pdf/lec-12-pac-conjunctions-upload.pdf1 Computational Learning Theory: An Analysis of a Conjunction Learner Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022051905/5ff719d68c26112b767906aa/html5/thumbnails/8.jpg)
Learning Conjunctions: Analysis
Claim 1: h will only make mistakes on positive future examplesWhy?
8
f = x2 ∧ x3∧ x4 ∧ x5∧ x100 h = x1∧ x2 ∧ x3∧ x4 ∧ x5∧ x100
![Page 9: Supervised Learning: The Setup Computational Learning ...zhe/pdf/lec-12-pac-conjunctions-upload.pdf1 Computational Learning Theory: An Analysis of a Conjunction Learner Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022051905/5ff719d68c26112b767906aa/html5/thumbnails/9.jpg)
Learning Conjunctions: Analysis
Claim 1: h will only make mistakes on positive future examplesWhy?
9
f = x2 ∧ x3∧ x4 ∧ x5∧ x100 h = x1∧ x2 ∧ x3∧ x4 ∧ x5∧ x100
A mistake will occur only if some feature z (in our example x1) is present in h but not in f
This mistake can cause a positive example to be predicted as negative by h
The reverse situation can never happen For an example to be predicted as positive in the training set, every relevant feature must have been present
Specifically: x1 = 0, x2 =1, x3=1, x4=1, x5=1, x100=1
![Page 10: Supervised Learning: The Setup Computational Learning ...zhe/pdf/lec-12-pac-conjunctions-upload.pdf1 Computational Learning Theory: An Analysis of a Conjunction Learner Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022051905/5ff719d68c26112b767906aa/html5/thumbnails/10.jpg)
Learning Conjunctions: Analysis
Claim 1: h will only make mistakes on positive future examplesWhy?
10
f = x2 ∧ x3∧ x4 ∧ x5∧ x100 h = x1∧ x2 ∧ x3∧ x4 ∧ x5∧ x100
A mistake will occur only if some feature z (in our example x1) is present in h but not in f
This mistake can cause a positive example to be predicted as negative by h
The reverse situation can never happen For an example to be predicted as positive in the training set, every relevant feature must have been present
Specifically: x1 = 0, x2 =1, x3=1, x4=1, x5=1, x100=1
![Page 11: Supervised Learning: The Setup Computational Learning ...zhe/pdf/lec-12-pac-conjunctions-upload.pdf1 Computational Learning Theory: An Analysis of a Conjunction Learner Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022051905/5ff719d68c26112b767906aa/html5/thumbnails/11.jpg)
Learning Conjunctions: Analysis
Claim 1: h will only make mistakes on positive future examples
Why?
11
f
h++
-
-
-
f = x2 ∧ x3∧ x4 ∧ x5∧ x100 h = x1∧ x2 ∧ x3∧ x4 ∧ x5∧ x100
A mistake will occur only if some feature z (in our
example x1) is present in h but not in f
This mistake can cause a positive example to be predicted as
negative by h
The reverse situation can never happen
For an example to be predicted as positive in the training set,
every relevant feature must have been present
Specifically: x1 = 0, x2 =1, x3=1, x4=1, x5=1, x100=1
![Page 12: Supervised Learning: The Setup Computational Learning ...zhe/pdf/lec-12-pac-conjunctions-upload.pdf1 Computational Learning Theory: An Analysis of a Conjunction Learner Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022051905/5ff719d68c26112b767906aa/html5/thumbnails/12.jpg)
Learning Conjunctions: Analysis
Theorem: Suppose we are learning a conjunctive concept with n dimensional Boolean features using m training examples. If
then, with probability > 1 - !, the error of the learned hypothesis errD(h) will be less than ".
12
![Page 13: Supervised Learning: The Setup Computational Learning ...zhe/pdf/lec-12-pac-conjunctions-upload.pdf1 Computational Learning Theory: An Analysis of a Conjunction Learner Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022051905/5ff719d68c26112b767906aa/html5/thumbnails/13.jpg)
Learning Conjunctions: Analysis
Theorem: Suppose we are learning a conjunctive concept with n dimensional Boolean features using m training examples. If
then, with probability > 1 - !, the error of the learned hypothesis errD(h) will be less than ".
13
If we see these many training examples, then the algorithm will produce a conjunction that, with high probability, will
make few errors
Poly in n, 1/!, 1/"
![Page 14: Supervised Learning: The Setup Computational Learning ...zhe/pdf/lec-12-pac-conjunctions-upload.pdf1 Computational Learning Theory: An Analysis of a Conjunction Learner Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022051905/5ff719d68c26112b767906aa/html5/thumbnails/14.jpg)
Learning Conjunctions: Analysis
Theorem: Suppose we are learning a conjunctive concept with n dimensional Boolean features using m training examples. If
then, with probability > 1 - !, the error of the learned hypothesis errD(h) will be less than ".
14
Let’s prove this assertion
![Page 15: Supervised Learning: The Setup Computational Learning ...zhe/pdf/lec-12-pac-conjunctions-upload.pdf1 Computational Learning Theory: An Analysis of a Conjunction Learner Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022051905/5ff719d68c26112b767906aa/html5/thumbnails/15.jpg)
Proof Intuition
What kinds of examples would drive a hypothesis to make a mistake?
Positive examples, where x1 is absentf would say true and h would say false
None of these examples appeared during trainingOtherwise x1 would have been eliminated
If they never appeared during training, maybe their appearance in the future would also be rare!
Let’s quantify our surprise at seeing such examples
15
f = x2 ∧ x3∧ x4 ∧ x5∧ x100 h = x1∧ x2 ∧ x3∧ x4 ∧ x5∧ x100
![Page 16: Supervised Learning: The Setup Computational Learning ...zhe/pdf/lec-12-pac-conjunctions-upload.pdf1 Computational Learning Theory: An Analysis of a Conjunction Learner Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022051905/5ff719d68c26112b767906aa/html5/thumbnails/16.jpg)
Proof Intuition
What kinds of examples would drive a hypothesis to make a mistake?
Positive examples, where x1 is absentf would say true and h would say false
None of these examples appeared during trainingOtherwise x1 would have been eliminated
If they never appeared during training, maybe their appearance in the future would also be rare!
Let’s quantify our surprise at seeing such examples
16
f = x2 ∧ x3∧ x4 ∧ x5∧ x100 h = x1∧ x2 ∧ x3∧ x4 ∧ x5∧ x100
![Page 17: Supervised Learning: The Setup Computational Learning ...zhe/pdf/lec-12-pac-conjunctions-upload.pdf1 Computational Learning Theory: An Analysis of a Conjunction Learner Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022051905/5ff719d68c26112b767906aa/html5/thumbnails/17.jpg)
Proof Intuition
What kinds of examples would drive a hypothesis to make a mistake?
Positive examples, where x1 is absentf would say true and h would say false
None of these examples appeared during trainingOtherwise x1 would have been eliminated
If they never appeared during training, maybe their appearance in the future would also be rare!
Let’s quantify our surprise at seeing such examples
17
f = x2 ∧ x3∧ x4 ∧ x5∧ x100 h = x1∧ x2 ∧ x3∧ x4 ∧ x5∧ x100
![Page 18: Supervised Learning: The Setup Computational Learning ...zhe/pdf/lec-12-pac-conjunctions-upload.pdf1 Computational Learning Theory: An Analysis of a Conjunction Learner Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022051905/5ff719d68c26112b767906aa/html5/thumbnails/18.jpg)
Proof Intuition
What kinds of examples would drive a hypothesis to make a mistake?
Positive examples, where x1 is absentf would say true and h would say false
None of these examples appeared during trainingOtherwise x1 would have been eliminated
If they never appeared during training, maybe their appearance in the future would also be rare!
Let’s quantify our surprise at seeing such examples
18
f = x2 ∧ x3∧ x4 ∧ x5∧ x100 h = x1∧ x2 ∧ x3∧ x4 ∧ x5∧ x100
![Page 19: Supervised Learning: The Setup Computational Learning ...zhe/pdf/lec-12-pac-conjunctions-upload.pdf1 Computational Learning Theory: An Analysis of a Conjunction Learner Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022051905/5ff719d68c26112b767906aa/html5/thumbnails/19.jpg)
Proof Intuition
What kinds of examples would drive a hypothesis to make a mistake?
Positive examples, where x1 is absentf would say true and h would say false
None of these examples appeared during trainingOtherwise x1 would have been eliminated
If they never appeared during training, maybe their appearance in the future would also be rare!
Let’s quantify our surprise at seeing such examples
19
f = x2 ∧ x3∧ x4 ∧ x5∧ x100 h = x1∧ x2 ∧ x3∧ x4 ∧ x5∧ x100
![Page 20: Supervised Learning: The Setup Computational Learning ...zhe/pdf/lec-12-pac-conjunctions-upload.pdf1 Computational Learning Theory: An Analysis of a Conjunction Learner Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022051905/5ff719d68c26112b767906aa/html5/thumbnails/20.jpg)
Learning Conjunctions: Analysis
Let p(z) be the probability that, in an example drawn from D, the feature z is 0 but the example has a positive label
• That is, after training is done, p(z) is the probability that in a randomly drawn example, the feature z causes a mistake
• For any z in the target function, p(z) = 0
20
![Page 21: Supervised Learning: The Setup Computational Learning ...zhe/pdf/lec-12-pac-conjunctions-upload.pdf1 Computational Learning Theory: An Analysis of a Conjunction Learner Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022051905/5ff719d68c26112b767906aa/html5/thumbnails/21.jpg)
Learning Conjunctions: Analysis
Let p(z) be the probability that, in an example drawn from D, the feature z is 0 but the example has a positive label
• That is, after training is done, p(z) is the probability that in a randomly drawn example, the feature z causes a mistake
• For any z in the target function, p(z) = 0
21
Remember that there will only be mistakes on positive examples for this toy problem
![Page 22: Supervised Learning: The Setup Computational Learning ...zhe/pdf/lec-12-pac-conjunctions-upload.pdf1 Computational Learning Theory: An Analysis of a Conjunction Learner Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022051905/5ff719d68c26112b767906aa/html5/thumbnails/22.jpg)
Learning Conjunctions: Analysis
Let p(z) be the probability that, in an example drawn from D, the feature z is 0 but the example has a positive label
• That is, after training is done, p(z) is the probability that in a randomly drawn example, the feature z causes a mistake
• For any z in the target function, p(z) = 0
22
f = x2 ∧ x3∧ x4 ∧ x5∧ x100
h = x1∧ x2 ∧ x3∧ x4 ∧ x5∧ x100
Remember that there will only be mistakes on positive examples for this toy problem
![Page 23: Supervised Learning: The Setup Computational Learning ...zhe/pdf/lec-12-pac-conjunctions-upload.pdf1 Computational Learning Theory: An Analysis of a Conjunction Learner Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022051905/5ff719d68c26112b767906aa/html5/thumbnails/23.jpg)
Learning Conjunctions: Analysis
Let p(z) be the probability that, in an example drawn from D, the feature z is 0 but the example has a positive label
• That is, after training is done, p(z) is the probability that in a randomly drawn example, the feature z causes a mistake
• For any z in the target function, p(z) = 0
23
f = x2 ∧ x3∧ x4 ∧ x5∧ x100 <(0,1,1,1,1,0,...0,1,1), 1>
h = x1∧ x2 ∧ x3∧ x4 ∧ x5∧ x100
Remember that there will only be mistakes on positive examples for this toy problem
![Page 24: Supervised Learning: The Setup Computational Learning ...zhe/pdf/lec-12-pac-conjunctions-upload.pdf1 Computational Learning Theory: An Analysis of a Conjunction Learner Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022051905/5ff719d68c26112b767906aa/html5/thumbnails/24.jpg)
Learning Conjunctions: Analysis
Let p(z) be the probability that, in an example drawn from D, the feature z is 0 but the example has a positive label
• That is, after training is done, p(z) is the probability that in a randomly drawn example, the feature z causes a mistake
• For any z in the target function, p(z) = 0
24
f = x2 ∧ x3∧ x4 ∧ x5∧ x100 <(0,1,1,1,1,0,...0,1,1), 1>
p(x1): Probability that this situation occurs
h = x1∧ x2 ∧ x3∧ x4 ∧ x5∧ x100
Remember that there will only be mistakes on positive examples for this toy problem
![Page 25: Supervised Learning: The Setup Computational Learning ...zhe/pdf/lec-12-pac-conjunctions-upload.pdf1 Computational Learning Theory: An Analysis of a Conjunction Learner Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022051905/5ff719d68c26112b767906aa/html5/thumbnails/25.jpg)
Learning Conjunctions: Analysis
Let p(z) be the probability that, in an example drawn from D, the feature z is 0 but the example has a positive label
• That is, after training is done, p(z) is the probability that in a randomly drawn example, the feature z causes a mistake
• For any z in the target function, p(z) = 0
We know that
Via direct application of the union bound
25
![Page 26: Supervised Learning: The Setup Computational Learning ...zhe/pdf/lec-12-pac-conjunctions-upload.pdf1 Computational Learning Theory: An Analysis of a Conjunction Learner Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022051905/5ff719d68c26112b767906aa/html5/thumbnails/26.jpg)
Learning Conjunctions: Analysis
Let p(z) be the probability that, in an example drawn from D, the feature z is 0 but the example has a positive label
• That is, after training is done, p(z) is the probability that in a randomly drawn example, the feature z causes a mistake
• For any z in the target function, p(z) = 0
We know that
Via direct application of the union bound
26
![Page 27: Supervised Learning: The Setup Computational Learning ...zhe/pdf/lec-12-pac-conjunctions-upload.pdf1 Computational Learning Theory: An Analysis of a Conjunction Learner Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022051905/5ff719d68c26112b767906aa/html5/thumbnails/27.jpg)
Learning Conjunctions: Analysis
Let p(z) be the probability that, in an example drawn from D, the feature z is 0 but the example has a positive label
• That is, after training is done, p(z) is the probability that in a randomly drawn example, the feature z causes a mistake
• For any z in the target function, p(z) = 0
We know that
Via direct application of the union bound
27
Union boundFor a set of events, probability that at least one of them happens < the sum of the probabilities of the individual events
![Page 28: Supervised Learning: The Setup Computational Learning ...zhe/pdf/lec-12-pac-conjunctions-upload.pdf1 Computational Learning Theory: An Analysis of a Conjunction Learner Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022051905/5ff719d68c26112b767906aa/html5/thumbnails/28.jpg)
Learning Conjunctions: Analysis
• Call a feature z bad if• Intuitively, a bad feature is one that has a significant probability
of not appearing with a positive example– (And, if it appears in all positive training examples, it can cause errors)
If there are no bad features, then errD(h) < !– Why? Because
28
n = dimensionality
Let us try to see when this will not happen
![Page 29: Supervised Learning: The Setup Computational Learning ...zhe/pdf/lec-12-pac-conjunctions-upload.pdf1 Computational Learning Theory: An Analysis of a Conjunction Learner Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022051905/5ff719d68c26112b767906aa/html5/thumbnails/29.jpg)
Learning Conjunctions: Analysis
• Call a feature z bad if• Intuitively, a bad feature is one that has a significant probability
of not appearing with a positive example– (And, if it appears in all positive training examples, it can cause errors)
If there are no bad features, then errD(h) < !– Why? Because
29
n = dimensionality
Let us try to see when this will not happen
![Page 30: Supervised Learning: The Setup Computational Learning ...zhe/pdf/lec-12-pac-conjunctions-upload.pdf1 Computational Learning Theory: An Analysis of a Conjunction Learner Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022051905/5ff719d68c26112b767906aa/html5/thumbnails/30.jpg)
Learning Conjunctions: Analysis
• Call a feature z bad if• Intuitively, a bad feature is one that has a significant probability
of not appearing with a positive example– (And, if it appears in all positive training examples, it can cause errors)
If there are no bad features, then errD(h) < !– Why? Because
30
n = dimensionality
Let us try to see when this will not happen
![Page 31: Supervised Learning: The Setup Computational Learning ...zhe/pdf/lec-12-pac-conjunctions-upload.pdf1 Computational Learning Theory: An Analysis of a Conjunction Learner Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022051905/5ff719d68c26112b767906aa/html5/thumbnails/31.jpg)
Learning Conjunctions: Analysis
• Call a feature z bad if• Intuitively, a bad feature is one that has a significant probability
of not appearing with a positive example– (And, if it appears in all positive training examples, it can cause errors)
What if there are bad features?Let z be a bad featureWhat is the probability that it will not be eliminated by one training example?
31
n = dimensionality
![Page 32: Supervised Learning: The Setup Computational Learning ...zhe/pdf/lec-12-pac-conjunctions-upload.pdf1 Computational Learning Theory: An Analysis of a Conjunction Learner Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022051905/5ff719d68c26112b767906aa/html5/thumbnails/32.jpg)
Learning Conjunctions: Analysis
• Call a feature z bad if• Intuitively, a bad feature is one that has a significant probability
of not appearing with a positive example– (And, if it appears in all positive training examples, it can cause errors)
What if there are bad features?Let z be a bad featureWhat is the probability that it will not be eliminated by one training example?
32
n = dimensionality
![Page 33: Supervised Learning: The Setup Computational Learning ...zhe/pdf/lec-12-pac-conjunctions-upload.pdf1 Computational Learning Theory: An Analysis of a Conjunction Learner Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022051905/5ff719d68c26112b767906aa/html5/thumbnails/33.jpg)
Learning Conjunctions: Analysis
• Call a feature z bad if• Intuitively, a bad feature is one that has a significant probability
of not appearing with a positive example– (And, if it appears in all positive training examples, it can cause errors)
What if there are bad features?Let z be a bad featureWhat is the probability that it will not be eliminated by one training example?
33
n = dimensionality
162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215
L1(U ,B) =1
2log |KBB |�
1
2log |KBB + �A1|�
1
2�a2 �
1
2�a3
� 1
2
KX
k=1
kU(k)k2F +1
2�2a4
>(KBB + �A1)
�1a4
+�
2tr(K�1
BBA1) + CONST
L2(U ,B,�) =1
2log |KBB |�
1
2log |KBB +A1|�
1
2a2 + a3
� 1
2�>KBB�+
1
2tr(K�1
BBA1)�1
2
KX
k=1
kU(k)k2F
�(t+1)= (KBB +A1)
�1(A1�
(t)+ a6)
a6 =
X
j
k(B,xij )(2yij � 1)N�k(B,xij )
>�(t)|0, 1�
��(2yij � 1)k(B,xij )
>�(t)�
L1(U ,B, q(v)) = log(p(U)) +Z
q(v) logp(v|B)
q(v)dv +
Xj
Zq(v)Fv(yij ,�)dv
L1
�y,U ,B, q(v)
�
B,UL⇤1(y,U ,B)
L1
�y,U ,B, q(v)
�
(A1,rA1)
(a2,ra2)
(a3,ra3)
(a4,ra4)
(a5,ra4)
L1(y,U ,B, q(v)) = log(p(U)) +H�q(v)
�
+
Zlog
�N (v|0, k(B,B)
�
+
Xj
Zq(v)Fv(yij ,�)dv
sgn� kX
i=1
ci · sgn(w>i x)
�(2)
= sgn� kX
i=1
ciw>i x
�(3)
yiw>xi
kwk < ⌘ (4)
4
![Page 34: Supervised Learning: The Setup Computational Learning ...zhe/pdf/lec-12-pac-conjunctions-upload.pdf1 Computational Learning Theory: An Analysis of a Conjunction Learner Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022051905/5ff719d68c26112b767906aa/html5/thumbnails/34.jpg)
Learning Conjunctions: Analysis
• Call a feature z bad if• Intuitively, a bad feature is one that has a significant probability
of not appearing with a positive example– (And, if it appears in all positive training examples, it can cause errors)
What if there are bad features?Let z be a bad featureWhat is the probability that it will not be eliminated by one training example?
34
n = dimensionality
162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215
L1(U ,B) =1
2log |KBB |�
1
2log |KBB + �A1|�
1
2�a2 �
1
2�a3
� 1
2
KX
k=1
kU(k)k2F +1
2�2a4
>(KBB + �A1)
�1a4
+�
2tr(K�1
BBA1) + CONST
L2(U ,B,�) =1
2log |KBB |�
1
2log |KBB +A1|�
1
2a2 + a3
� 1
2�>KBB�+
1
2tr(K�1
BBA1)�1
2
KX
k=1
kU(k)k2F
�(t+1)= (KBB +A1)
�1(A1�
(t)+ a6)
a6 =
X
j
k(B,xij )(2yij � 1)N�k(B,xij )
>�(t)|0, 1�
��(2yij � 1)k(B,xij )
>�(t)�
L1(U ,B, q(v)) = log(p(U)) +Z
q(v) logp(v|B)
q(v)dv +
Xj
Zq(v)Fv(yij ,�)dv
L1
�y,U ,B, q(v)
�
B,UL⇤1(y,U ,B)
L1
�y,U ,B, q(v)
�
(A1,rA1)
(a2,ra2)
(a3,ra3)
(a4,ra4)
(a5,ra4)
L1(y,U ,B, q(v)) = log(p(U)) +H�q(v)
�
+
Zlog
�N (v|0, k(B,B)
�
+
Xj
Zq(v)Fv(yij ,�)dv
sgn� kX
i=1
ci · sgn(w>i x)
�(2)
= sgn� kX
i=1
ciw>i x
�(3)
yiw>xi
kwk < ⌘ (4)
4
<(1,1,1,1,1,0,...0,1,1), 1>There was one example of this kind
![Page 35: Supervised Learning: The Setup Computational Learning ...zhe/pdf/lec-12-pac-conjunctions-upload.pdf1 Computational Learning Theory: An Analysis of a Conjunction Learner Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022051905/5ff719d68c26112b767906aa/html5/thumbnails/35.jpg)
Learning Conjunctions: Analysis
What we know so far:
But say we have m training examples. Then
There are at most n bad features. So
35
n = dimensionality
162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215
L1(U ,B) =1
2log |KBB |�
1
2log |KBB + �A1|�
1
2�a2 �
1
2�a3
� 1
2
KX
k=1
kU(k)k2F +1
2�2a4
>(KBB + �A1)
�1a4
+�
2tr(K�1
BBA1) + CONST
L2(U ,B,�) =1
2log |KBB |�
1
2log |KBB +A1|�
1
2a2 + a3
� 1
2�>KBB�+
1
2tr(K�1
BBA1)�1
2
KX
k=1
kU(k)k2F
�(t+1)= (KBB +A1)
�1(A1�
(t)+ a6)
a6 =
X
j
k(B,xij )(2yij � 1)N�k(B,xij )
>�(t)|0, 1�
��(2yij � 1)k(B,xij )
>�(t)�
L1(U ,B, q(v)) = log(p(U)) +Z
q(v) logp(v|B)
q(v)dv +
Xj
Zq(v)Fv(yij ,�)dv
L1
�y,U ,B, q(v)
�
B,UL⇤1(y,U ,B)
L1
�y,U ,B, q(v)
�
(A1,rA1)
(a2,ra2)
(a3,ra3)
(a4,ra4)
(a5,ra4)
L1(y,U ,B, q(v)) = log(p(U)) +H�q(v)
�
+
Zlog
�N (v|0, k(B,B)
�
+
Xj
Zq(v)Fv(yij ,�)dv
sgn� kX
i=1
ci · sgn(w>i x)
�(2)
= sgn� kX
i=1
ciw>i x
�(3)
yiw>xi
kwk < ⌘ (4)
s1 + s2 + s3 � 0 (5)�
Pr(A bad feature is not eliminated by one example) 1� ✏
n
4
162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215
L1(U ,B) =1
2log |KBB |�
1
2log |KBB + �A1|�
1
2�a2 �
1
2�a3
� 1
2
KX
k=1
kU(k)k2F +1
2�2a4
>(KBB + �A1)
�1a4
+�
2tr(K�1
BBA1) + CONST
L2(U ,B,�) =1
2log |KBB |�
1
2log |KBB +A1|�
1
2a2 + a3
� 1
2�>KBB�+
1
2tr(K�1
BBA1)�1
2
KX
k=1
kU(k)k2F
�(t+1)= (KBB +A1)
�1(A1�
(t)+ a6)
a6 =
X
j
k(B,xij )(2yij � 1)N�k(B,xij )
>�(t)|0, 1�
��(2yij � 1)k(B,xij )
>�(t)�
L1(U ,B, q(v)) = log(p(U)) +Z
q(v) logp(v|B)
q(v)dv +
Xj
Zq(v)Fv(yij ,�)dv
L1
�y,U ,B, q(v)
�
B,UL⇤1(y,U ,B)
L1
�y,U ,B, q(v)
�
(A1,rA1)
(a2,ra2)
(a3,ra3)
(a4,ra4)
(a5,ra4)
L1(y,U ,B, q(v)) = log(p(U)) +H�q(v)
�
+
Zlog
�N (v|0, k(B,B)
�
+
Xj
Zq(v)Fv(yij ,�)dv
sgn� kX
i=1
ci · sgn(w>i x)
�(2)
= sgn� kX
i=1
ciw>i x
�(3)
yiw>xi
kwk < ⌘ (4)
s1 + s2 + s3 � 0 (5)�
Pr(A bad feature is not eliminated by one example) 1� ✏
n
Pr(A bad feature survives m examples) �1� ✏
n
�m
4
216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269
s1 + s2 + s3 � 0 (5)
�
Pr(A bad feature is not eliminated by one example) 1� ✏
n
Pr(A bad feature survives m examples) �1� ✏
n
�m
Pr(Any bad feature survives m examples) n�1� ✏
n
�m
5
![Page 36: Supervised Learning: The Setup Computational Learning ...zhe/pdf/lec-12-pac-conjunctions-upload.pdf1 Computational Learning Theory: An Analysis of a Conjunction Learner Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022051905/5ff719d68c26112b767906aa/html5/thumbnails/36.jpg)
Learning Conjunctions: Analysis
What we know so far:
But say we have m training examples. Then
There are at most n bad features. So
36
n = dimensionality
162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215
L1(U ,B) =1
2log |KBB |�
1
2log |KBB + �A1|�
1
2�a2 �
1
2�a3
� 1
2
KX
k=1
kU(k)k2F +1
2�2a4
>(KBB + �A1)
�1a4
+�
2tr(K�1
BBA1) + CONST
L2(U ,B,�) =1
2log |KBB |�
1
2log |KBB +A1|�
1
2a2 + a3
� 1
2�>KBB�+
1
2tr(K�1
BBA1)�1
2
KX
k=1
kU(k)k2F
�(t+1)= (KBB +A1)
�1(A1�
(t)+ a6)
a6 =
X
j
k(B,xij )(2yij � 1)N�k(B,xij )
>�(t)|0, 1�
��(2yij � 1)k(B,xij )
>�(t)�
L1(U ,B, q(v)) = log(p(U)) +Z
q(v) logp(v|B)
q(v)dv +
Xj
Zq(v)Fv(yij ,�)dv
L1
�y,U ,B, q(v)
�
B,UL⇤1(y,U ,B)
L1
�y,U ,B, q(v)
�
(A1,rA1)
(a2,ra2)
(a3,ra3)
(a4,ra4)
(a5,ra4)
L1(y,U ,B, q(v)) = log(p(U)) +H�q(v)
�
+
Zlog
�N (v|0, k(B,B)
�
+
Xj
Zq(v)Fv(yij ,�)dv
sgn� kX
i=1
ci · sgn(w>i x)
�(2)
= sgn� kX
i=1
ciw>i x
�(3)
yiw>xi
kwk < ⌘ (4)
s1 + s2 + s3 � 0 (5)�
Pr(A bad feature is not eliminated by one example) 1� ✏
n
4
162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215
L1(U ,B) =1
2log |KBB |�
1
2log |KBB + �A1|�
1
2�a2 �
1
2�a3
� 1
2
KX
k=1
kU(k)k2F +1
2�2a4
>(KBB + �A1)
�1a4
+�
2tr(K�1
BBA1) + CONST
L2(U ,B,�) =1
2log |KBB |�
1
2log |KBB +A1|�
1
2a2 + a3
� 1
2�>KBB�+
1
2tr(K�1
BBA1)�1
2
KX
k=1
kU(k)k2F
�(t+1)= (KBB +A1)
�1(A1�
(t)+ a6)
a6 =
X
j
k(B,xij )(2yij � 1)N�k(B,xij )
>�(t)|0, 1�
��(2yij � 1)k(B,xij )
>�(t)�
L1(U ,B, q(v)) = log(p(U)) +Z
q(v) logp(v|B)
q(v)dv +
Xj
Zq(v)Fv(yij ,�)dv
L1
�y,U ,B, q(v)
�
B,UL⇤1(y,U ,B)
L1
�y,U ,B, q(v)
�
(A1,rA1)
(a2,ra2)
(a3,ra3)
(a4,ra4)
(a5,ra4)
L1(y,U ,B, q(v)) = log(p(U)) +H�q(v)
�
+
Zlog
�N (v|0, k(B,B)
�
+
Xj
Zq(v)Fv(yij ,�)dv
sgn� kX
i=1
ci · sgn(w>i x)
�(2)
= sgn� kX
i=1
ciw>i x
�(3)
yiw>xi
kwk < ⌘ (4)
s1 + s2 + s3 � 0 (5)�
Pr(A bad feature is not eliminated by one example) 1� ✏
n
Pr(A bad feature survives m examples) �1� ✏
n
�m
4
216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269
s1 + s2 + s3 � 0 (5)
�
Pr(A bad feature is not eliminated by one example) 1� ✏
n
Pr(A bad feature survives m examples) �1� ✏
n
�m
Pr(Any bad feature survives m examples) n�1� ✏
n
�m
5
![Page 37: Supervised Learning: The Setup Computational Learning ...zhe/pdf/lec-12-pac-conjunctions-upload.pdf1 Computational Learning Theory: An Analysis of a Conjunction Learner Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022051905/5ff719d68c26112b767906aa/html5/thumbnails/37.jpg)
Learning Conjunctions: Analysis
What we know so far:
But say we have m training examples. Then
There are at most n bad features. So
37
n = dimensionality
162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215
L1(U ,B) =1
2log |KBB |�
1
2log |KBB + �A1|�
1
2�a2 �
1
2�a3
� 1
2
KX
k=1
kU(k)k2F +1
2�2a4
>(KBB + �A1)
�1a4
+�
2tr(K�1
BBA1) + CONST
L2(U ,B,�) =1
2log |KBB |�
1
2log |KBB +A1|�
1
2a2 + a3
� 1
2�>KBB�+
1
2tr(K�1
BBA1)�1
2
KX
k=1
kU(k)k2F
�(t+1)= (KBB +A1)
�1(A1�
(t)+ a6)
a6 =
X
j
k(B,xij )(2yij � 1)N�k(B,xij )
>�(t)|0, 1�
��(2yij � 1)k(B,xij )
>�(t)�
L1(U ,B, q(v)) = log(p(U)) +Z
q(v) logp(v|B)
q(v)dv +
Xj
Zq(v)Fv(yij ,�)dv
L1
�y,U ,B, q(v)
�
B,UL⇤1(y,U ,B)
L1
�y,U ,B, q(v)
�
(A1,rA1)
(a2,ra2)
(a3,ra3)
(a4,ra4)
(a5,ra4)
L1(y,U ,B, q(v)) = log(p(U)) +H�q(v)
�
+
Zlog
�N (v|0, k(B,B)
�
+
Xj
Zq(v)Fv(yij ,�)dv
sgn� kX
i=1
ci · sgn(w>i x)
�(2)
= sgn� kX
i=1
ciw>i x
�(3)
yiw>xi
kwk < ⌘ (4)
s1 + s2 + s3 � 0 (5)�
Pr(A bad feature is not eliminated by one example) 1� ✏
n
4
162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215
L1(U ,B) =1
2log |KBB |�
1
2log |KBB + �A1|�
1
2�a2 �
1
2�a3
� 1
2
KX
k=1
kU(k)k2F +1
2�2a4
>(KBB + �A1)
�1a4
+�
2tr(K�1
BBA1) + CONST
L2(U ,B,�) =1
2log |KBB |�
1
2log |KBB +A1|�
1
2a2 + a3
� 1
2�>KBB�+
1
2tr(K�1
BBA1)�1
2
KX
k=1
kU(k)k2F
�(t+1)= (KBB +A1)
�1(A1�
(t)+ a6)
a6 =
X
j
k(B,xij )(2yij � 1)N�k(B,xij )
>�(t)|0, 1�
��(2yij � 1)k(B,xij )
>�(t)�
L1(U ,B, q(v)) = log(p(U)) +Z
q(v) logp(v|B)
q(v)dv +
Xj
Zq(v)Fv(yij ,�)dv
L1
�y,U ,B, q(v)
�
B,UL⇤1(y,U ,B)
L1
�y,U ,B, q(v)
�
(A1,rA1)
(a2,ra2)
(a3,ra3)
(a4,ra4)
(a5,ra4)
L1(y,U ,B, q(v)) = log(p(U)) +H�q(v)
�
+
Zlog
�N (v|0, k(B,B)
�
+
Xj
Zq(v)Fv(yij ,�)dv
sgn� kX
i=1
ci · sgn(w>i x)
�(2)
= sgn� kX
i=1
ciw>i x
�(3)
yiw>xi
kwk < ⌘ (4)
s1 + s2 + s3 � 0 (5)�
Pr(A bad feature is not eliminated by one example) 1� ✏
n
Pr(A bad feature survives m examples) �1� ✏
n
�m
4
216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269
s1 + s2 + s3 � 0 (5)
�
Pr(A bad feature is not eliminated by one example) 1� ✏
n
Pr(A bad feature survives m examples) �1� ✏
n
�m
Pr(Any bad feature survives m examples) n�1� ✏
n
�m
5
![Page 38: Supervised Learning: The Setup Computational Learning ...zhe/pdf/lec-12-pac-conjunctions-upload.pdf1 Computational Learning Theory: An Analysis of a Conjunction Learner Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022051905/5ff719d68c26112b767906aa/html5/thumbnails/38.jpg)
Learning Conjunctions: Analysis
We want this probability to be small
Why? So that we can choose enough training examples so that the probability that any z survives all of them is less than some !
That is, we want
We know that 1 – x < e-x. So it is sufficient to require
38
216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269
s1 + s2 + s3 � 0 (5)
�
Pr(A bad feature is not eliminated by one example) 1� ✏
n
Pr(A bad feature survives m examples) �1� ✏
n
�m
Pr(Any bad feature survives m examples) n�1� ✏
n
�m
5
![Page 39: Supervised Learning: The Setup Computational Learning ...zhe/pdf/lec-12-pac-conjunctions-upload.pdf1 Computational Learning Theory: An Analysis of a Conjunction Learner Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022051905/5ff719d68c26112b767906aa/html5/thumbnails/39.jpg)
Learning Conjunctions: Analysis
We want this probability to be small
Why? So that we can choose enough training examples so that the probability that any z survives all of them is less than some !
That is, we want
We know that 1 – x < e-x. So it is sufficient to require
39
216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269
s1 + s2 + s3 � 0 (5)
�
Pr(A bad feature is not eliminated by one example) 1� ✏
n
Pr(A bad feature survives m examples) �1� ✏
n
�m
Pr(Any bad feature survives m examples) n�1� ✏
n
�m
5
![Page 40: Supervised Learning: The Setup Computational Learning ...zhe/pdf/lec-12-pac-conjunctions-upload.pdf1 Computational Learning Theory: An Analysis of a Conjunction Learner Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022051905/5ff719d68c26112b767906aa/html5/thumbnails/40.jpg)
Learning Conjunctions: Analysis
We want this probability to be small
Why? So that we can choose enough training examples so that the probability that any z survives all of them is less than some !
That is, we want
We know that 1 – x < e-x. So it is sufficient to require
40
216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269
s1 + s2 + s3 � 0 (5)
�
Pr(A bad feature is not eliminated by one example) 1� ✏
n
Pr(A bad feature survives m examples) �1� ✏
n
�m
Pr(Any bad feature survives m examples) n�1� ✏
n
�m
5
![Page 41: Supervised Learning: The Setup Computational Learning ...zhe/pdf/lec-12-pac-conjunctions-upload.pdf1 Computational Learning Theory: An Analysis of a Conjunction Learner Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022051905/5ff719d68c26112b767906aa/html5/thumbnails/41.jpg)
Learning Conjunctions: Analysis
We want this probability to be small
Why? So that we can choose enough training examples so that the probability that any z survives all of them is less than some !
That is, we want
We know that 1 – x < e-x. So it is sufficient to require
Or equivalently,
41
216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269
s1 + s2 + s3 � 0 (5)
�
Pr(A bad feature is not eliminated by one example) 1� ✏
n
Pr(A bad feature survives m examples) �1� ✏
n
�m
Pr(Any bad feature survives m examples) n�1� ✏
n
�m
5
![Page 42: Supervised Learning: The Setup Computational Learning ...zhe/pdf/lec-12-pac-conjunctions-upload.pdf1 Computational Learning Theory: An Analysis of a Conjunction Learner Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022051905/5ff719d68c26112b767906aa/html5/thumbnails/42.jpg)
Learning Conjunctions: Analysis
To guarantee a probability of failure (i.e, error >!) that is less than ", the number of examples we need is
That is, if m has this property, then• With probability 1 - ", there will be no bad features,• Or equivalently, with probability 1 - ", we will have errD(h) < !
What does this mean:• If ² = 0.1 and ± = 0.1, then for n = 100, we need 6908 training examples• If ² = 0.1 and ± = 0.1, then for n = 10, we need only 461 examples• If ² = 0.1 and ± = 0.01, then for n = 10, we need 691 examples
42
Poly in n, 1/", 1/!
![Page 43: Supervised Learning: The Setup Computational Learning ...zhe/pdf/lec-12-pac-conjunctions-upload.pdf1 Computational Learning Theory: An Analysis of a Conjunction Learner Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022051905/5ff719d68c26112b767906aa/html5/thumbnails/43.jpg)
Learning Conjunctions: Analysis
To guarantee a probability of failure (i.e, error >!) that is less than ", the number of examples we need is
That is, if m has this property, then• With probability 1 - ", there will be no bad features,• Or equivalently, with probability 1 - ", we will have errD(h) < !
What does this mean:• If ! = 0.1 and " = 0.1, then for n = 100, we need 6908 training examples• If ² = 0.1 and ± = 0.1, then for n = 10, we need only 461 examples• If ² = 0.1 and ± = 0.01, then for n = 10, we need 691 examples
43
Poly in n, 1/", 1/!
![Page 44: Supervised Learning: The Setup Computational Learning ...zhe/pdf/lec-12-pac-conjunctions-upload.pdf1 Computational Learning Theory: An Analysis of a Conjunction Learner Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022051905/5ff719d68c26112b767906aa/html5/thumbnails/44.jpg)
Learning Conjunctions: Analysis
To guarantee a probability of failure (i.e, error >!) that is less than ", the number of examples we need is
That is, if m has this property, then• With probability 1 - ", there will be no bad features,• Or equivalently, with probability 1 - ", we will have errD(h) < !
What does this mean:• If ! = 0.1 and " = 0.1, then for n = 100, we need 6908 training examples• If ! = 0.1 and " = 0.1, then for n = 10, we need only 461 examples• If ² = 0.1 and ± = 0.1, then for n = 10, we need only 461 examples• If ² = 0.1 and ± = 0.01, then for n = 10, we need 691 examples
44
Poly in n, 1/", 1/!
![Page 45: Supervised Learning: The Setup Computational Learning ...zhe/pdf/lec-12-pac-conjunctions-upload.pdf1 Computational Learning Theory: An Analysis of a Conjunction Learner Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022051905/5ff719d68c26112b767906aa/html5/thumbnails/45.jpg)
Learning Conjunctions: Analysis
To guarantee a probability of failure (i.e, error >!) that is less than ", the number of examples we need is
That is, if m has this property, then• With probability 1 - ", there will be no bad features,• Or equivalently, with probability 1 - ", we will have errD(h) < !
What does this mean:• If ! = 0.1 and " = 0.1, then for n = 100, we need 6908 training examples• If ! = 0.1 and " = 0.1, then for n = 10, we need only 461 examples• If ! = 0.1 and " = 0.01, then for n = 10, we need only 691 examples• If ! = 0.1 and " = 0.01, then for n = 10, we need 691 examples
45
Poly in n, 1/", 1/!
![Page 46: Supervised Learning: The Setup Computational Learning ...zhe/pdf/lec-12-pac-conjunctions-upload.pdf1 Computational Learning Theory: An Analysis of a Conjunction Learner Machine Learning](https://reader033.fdocuments.net/reader033/viewer/2022051905/5ff719d68c26112b767906aa/html5/thumbnails/46.jpg)
Learning Conjunctions: Analysis
To guarantee a probability of failure (i.e, error >!) that is less than ", the number of examples we need is
That is, if m has this property, then• With probability 1 - ", there will be no bad features,• Or equivalently, with probability 1 - ", we will have errD(h) < !
How to use this:• If ² = 0.1 and ± = 0.1, then for n = 100, we need 6908 training examples• If ² = 0.1 and ± = 0.1, then for n = 10, we need only 461 examples• If ² = 0.1 and ± = 0.01, then for n = 10, we need 691 examples
46
What we have here is a PAC guarantee
Our algorithm is Probably Approximately Correct