Today’s Topics (only on final at a high level; Sec 19.5 and Sec 18.5 readings below are ‘skim...
-
Upload
beatrix-kelley -
Category
Documents
-
view
214 -
download
0
Transcript of Today’s Topics (only on final at a high level; Sec 19.5 and Sec 18.5 readings below are ‘skim...
![Page 1: Today’s Topics (only on final at a high level; Sec 19.5 and Sec 18.5 readings below are ‘skim only’) 12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30,](https://reader036.fdocuments.net/reader036/viewer/2022070413/5697bfde1a28abf838cb2556/html5/thumbnails/1.jpg)
1
Today’s Topics (only on final at a high level; Sec 19.5 and Sec 18.5 readings below are ‘skim only’)
12/8/15 CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14
• HW5 must be turned in by 11:55pm Fri (soln out early Sat)
• Read Chapters 26 and 27 of textbook for Next Tuesday
• Exam (comprehensive, with focus on material since midterm),
Thurs 5:30-7:30pm, in this room, two pages and notes and
simple calculator (log, e, * / + -) allowed
• Next Tues We’ll Cover My Fall 2014 Final (Spring 2013 Next Weds?)
• A Short Introduction to Inductive Logic Programming (ILP) – Sec. 19.5 of textbook
- learning FOPC ‘rule sets’
- could, in a follow-up step, learn MLN weights on these rules
(ie, learn ‘structure’ then learn ‘wgts’)
• A Short Introduction to Computational Learning Theory (COLT) – Sec 18.5 of text
![Page 2: Today’s Topics (only on final at a high level; Sec 19.5 and Sec 18.5 readings below are ‘skim only’) 12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30,](https://reader036.fdocuments.net/reader036/viewer/2022070413/5697bfde1a28abf838cb2556/html5/thumbnails/2.jpg)
12/8/15 CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14
Inductive Logic Programming (ILP)
• Use mathematical logic to– Represent training examples
(goes beyond fixed-length feature vectors)
– Represent learned models (FOPC rule sets)
• ML work in the late ’70s through early ’90s was logic-based, then statistical ML ‘took over’
2
![Page 3: Today’s Topics (only on final at a high level; Sec 19.5 and Sec 18.5 readings below are ‘skim only’) 12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30,](https://reader036.fdocuments.net/reader036/viewer/2022070413/5697bfde1a28abf838cb2556/html5/thumbnails/3.jpg)
12/8/15 CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14
Examples in FOPC(not all have same # of ‘features’)
on(ex1, block1, table) on(ex1, block2, block1) color(ex1, block1, blue) color(ex1, block2, blue) size(ex1, block1, large) size(ex1, block2, small)
< a much larger number of facts are needed to describe example2 >
PosEx1
PosEx2
Learned Concept tower(?E) if
on(?E, ?A, table),on(?E, ?B, ?A).
3
![Page 4: Today’s Topics (only on final at a high level; Sec 19.5 and Sec 18.5 readings below are ‘skim only’) 12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30,](https://reader036.fdocuments.net/reader036/viewer/2022070413/5697bfde1a28abf838cb2556/html5/thumbnails/4.jpg)
12/8/15 CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14
Searching for a Good Rule(propositional-logic version)
P if A
P if B and C
P if C
P if B and D
P is always true
P if B
4
![Page 5: Today’s Topics (only on final at a high level; Sec 19.5 and Sec 18.5 readings below are ‘skim only’) 12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30,](https://reader036.fdocuments.net/reader036/viewer/2022070413/5697bfde1a28abf838cb2556/html5/thumbnails/5.jpg)
All Possible Extensions of a Clause (capital letters are variables)
Assume we are expanding this node
q(X, Z) p(X, Y)
What are the possible extensions using r/3 ?
r(X,X,X) r(Y,Y,Y) r(Z,Z,Z) r(1,1,1)
r(X,Y,Z) r(Z,Y,X) r(X,X,Y) r(X,X,1)
r(X,Y,A) r(X,A,B) r(A,A,A) r(A,B,1)
and many more …
Choose from: old variables, constants, new vars
Huge branching factor in our search!12/8/15 CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14 5
![Page 6: Today’s Topics (only on final at a high level; Sec 19.5 and Sec 18.5 readings below are ‘skim only’) 12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30,](https://reader036.fdocuments.net/reader036/viewer/2022070413/5697bfde1a28abf838cb2556/html5/thumbnails/6.jpg)
12/8/15 CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14
Example: ILP in the Blocks World
Consider this training set
POS
NEG
6
Can you guess an FOPC rule?
![Page 7: Today’s Topics (only on final at a high level; Sec 19.5 and Sec 18.5 readings below are ‘skim only’) 12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30,](https://reader036.fdocuments.net/reader036/viewer/2022070413/5697bfde1a28abf838cb2556/html5/thumbnails/7.jpg)
12/8/15 CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14
Searching for a Good Rule(FOPC version; cap letters are vars)
on(X,Y) POS
true POS
7
blue(X) POS tall(X) POS
Assume we have: tall(X), wide(Y), square(X), on(X,Y), red(X), green(X), blue(X), block(X)
…
POSSIBLE RULE LEARNED: If on(X,Y) block(Y) blue(X) POS
- hard to learn with fixed-length feature vectors!
+
-
![Page 8: Today’s Topics (only on final at a high level; Sec 19.5 and Sec 18.5 readings below are ‘skim only’) 12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30,](https://reader036.fdocuments.net/reader036/viewer/2022070413/5697bfde1a28abf838cb2556/html5/thumbnails/8.jpg)
12/8/15 CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14
Covering Algorithms(learn a rule, then recur; so disjunctive)
+
+
+
++
+
+
-
--
-
-
- -- -
--
-+
++ +
+
+
+
+
+
+
++
+
+
-
--
-
-
- -- -
--
-
Examples covered by Rule 1
Examples Still to Cover; use to learn Rule 2
8
![Page 9: Today’s Topics (only on final at a high level; Sec 19.5 and Sec 18.5 readings below are ‘skim only’) 12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30,](https://reader036.fdocuments.net/reader036/viewer/2022070413/5697bfde1a28abf838cb2556/html5/thumbnails/9.jpg)
12/8/15 CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14
Using Background Knowledge (BK) in ILP
• Now consider adding some domain knowledge about the task being learned
• For example If Q, R, and W are all true
Then you can infer Z is true
• Can also do arithmetic, etc in BK rule bodies
If SOME_TRIG_CALCS_OUTSIDE_OF_LOGIC Then openPassingLane(P1, P2, Radius, Angle)
9
![Page 10: Today’s Topics (only on final at a high level; Sec 19.5 and Sec 18.5 readings below are ‘skim only’) 12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30,](https://reader036.fdocuments.net/reader036/viewer/2022070413/5697bfde1a28abf838cb2556/html5/thumbnails/10.jpg)
12/8/15 CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14
Searching for a Good Ruleusing Deduced Features (eg, Z)
P if A P if CP if B
P if Z
P if B & Z
Note that more BK can lead to slower learning!
But hopefully less search depth needed
P is always true
P if B and DP if B and C
10
![Page 11: Today’s Topics (only on final at a high level; Sec 19.5 and Sec 18.5 readings below are ‘skim only’) 12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30,](https://reader036.fdocuments.net/reader036/viewer/2022070413/5697bfde1a28abf838cb2556/html5/thumbnails/11.jpg)
12/8/15 CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14
Controlling the Search for a Good Rule
• Choose a ‘seed’ positive example, then only consider properties that are true about this example
• Specify argument types and whether arguments are ‘input’ (+) or ‘output’ (-)– Only consider adding a literal
if all of its input arguments already present in rule– For example
enemies(+person, -person)
Only if a variable of type PERSON is already in the rule [eg, murdered(person)], consider adding that person’s enemies
11
![Page 12: Today’s Topics (only on final at a high level; Sec 19.5 and Sec 18.5 readings below are ‘skim only’) 12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30,](https://reader036.fdocuments.net/reader036/viewer/2022070413/5697bfde1a28abf838cb2556/html5/thumbnails/12.jpg)
12/8/15 CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14
Formal Specification of the ILP Task
Given a set of pos examples (P)
a set of neg examples (N)
some background knowledge (BK)
Do induce additional knowledge (AK)such that
BK AK allows all/most in P to be proved
BK AK allows none/few in N to be proved
Technically, the BK also contains all the facts about the pos and neg examples plus some rules
12
![Page 13: Today’s Topics (only on final at a high level; Sec 19.5 and Sec 18.5 readings below are ‘skim only’) 12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30,](https://reader036.fdocuments.net/reader036/viewer/2022070413/5697bfde1a28abf838cb2556/html5/thumbnails/13.jpg)
CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14
ILP Wrapup• Use best-first search with a large beam
• Commonly used scoring function #posExCovered - #negExCoved – ruleLength
• Performs ML without requiring fixed-length-feature-vectors
• Produces human-readable rules(straightforward to convert FOPC to English)
• Can be slow due to large search space
• Appealing ‘inner loop’ for prob logic learning
12/8/15 13
![Page 14: Today’s Topics (only on final at a high level; Sec 19.5 and Sec 18.5 readings below are ‘skim only’) 12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30,](https://reader036.fdocuments.net/reader036/viewer/2022070413/5697bfde1a28abf838cb2556/html5/thumbnails/14.jpg)
COLT: Probably Approximately Correct (PAC) Learning
PAC theory of learning (Valiant ’84)Given
C class of possible conceptsc C target conceptH hypothesis space (usually H = C) , correctness boundsN polynomial number of examples
12/8/15 CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14 14
![Page 15: Today’s Topics (only on final at a high level; Sec 19.5 and Sec 18.5 readings below are ‘skim only’) 12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30,](https://reader036.fdocuments.net/reader036/viewer/2022070413/5697bfde1a28abf838cb2556/html5/thumbnails/15.jpg)
Probably Approximately Correct (PAC) Learning
• Do with probability 1 - , return an h in H whose accuracy is at least 1 -
• Do this for any probability distribution for the examples
• In other words
Prob[error(h, c) > ] <
h
c
Shaded regions are where errors occur
12/8/15 CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14 15
![Page 16: Today’s Topics (only on final at a high level; Sec 19.5 and Sec 18.5 readings below are ‘skim only’) 12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30,](https://reader036.fdocuments.net/reader036/viewer/2022070413/5697bfde1a28abf838cb2556/html5/thumbnails/16.jpg)
How Many Examples Needed to be PAC?
Consider finite hypothesis spaces
Let Hbad { h1, …, hz }
• The set of hypotheses whose (‘testset’) error is >
• Goal With high prob, eliminate all items in Hbad via (noise-free)
training examples12/8/15 CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14 16
![Page 17: Today’s Topics (only on final at a high level; Sec 19.5 and Sec 18.5 readings below are ‘skim only’) 12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30,](https://reader036.fdocuments.net/reader036/viewer/2022070413/5697bfde1a28abf838cb2556/html5/thumbnails/17.jpg)
How Many Examples Needed to be PAC?
How can an h look bad, even though it is correct on all the training examples?• If we never see any examples in the
shaded regions• We’ll compute an N s.t. the odds of this
are sufficiently low (recall, N = number of examples)
h
c
12/8/15 CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14 17
![Page 18: Today’s Topics (only on final at a high level; Sec 19.5 and Sec 18.5 readings below are ‘skim only’) 12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30,](https://reader036.fdocuments.net/reader036/viewer/2022070413/5697bfde1a28abf838cb2556/html5/thumbnails/18.jpg)
Hbad
• Consider H1 Hbad and ex { N }
• What is the probability that H1 is consistent with ex ?
Prob[consistentA(ex, H1)] ≤ 1 - (since H1 is bad its error rate is at least )
The set of N examples
12/8/15 CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14 18
![Page 19: Today’s Topics (only on final at a high level; Sec 19.5 and Sec 18.5 readings below are ‘skim only’) 12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30,](https://reader036.fdocuments.net/reader036/viewer/2022070413/5697bfde1a28abf838cb2556/html5/thumbnails/19.jpg)
Hbad (cont.)
What is the probability that H1 is
consistent with all N examples?
Prob[consistentB({ N }, H1)] ≤ (1 - )|N|
(by iid assumption)
12/8/15 CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14 19
![Page 20: Today’s Topics (only on final at a high level; Sec 19.5 and Sec 18.5 readings below are ‘skim only’) 12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30,](https://reader036.fdocuments.net/reader036/viewer/2022070413/5697bfde1a28abf838cb2556/html5/thumbnails/20.jpg)
Hbad (cont.)
What is the probability that some member of Hbad is consistent with the examples in { N } ?
Prob[consistentC({N}, Hbad)]
Prob[consistentB({N}, H1) …
consistentB({N}, Hz)]
≤ |Hbad| x (1-)|N| // P(A B) = P(A) + P(B) - P(A B)
≤ |H| x (1- )|N| // Hbad H Ignore this in upper bound calc
12/8/15 CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14 20
![Page 21: Today’s Topics (only on final at a high level; Sec 19.5 and Sec 18.5 readings below are ‘skim only’) 12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30,](https://reader036.fdocuments.net/reader036/viewer/2022070413/5697bfde1a28abf838cb2556/html5/thumbnails/21.jpg)
Solving for #Examples, |N|
We want
Prob[consistentC({N}, Hbad)]
≤ |H| x (1-)|N| < Recall that we want the prob of a bad concept surviving to be less than , our bound on learning a poor concept
Assume that if many consistent hypotheses survive, we get unlucky and choose a bad one (we’re doing a worst-case analysis)
12/8/15 CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14 21
![Page 22: Today’s Topics (only on final at a high level; Sec 19.5 and Sec 18.5 readings below are ‘skim only’) 12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30,](https://reader036.fdocuments.net/reader036/viewer/2022070413/5697bfde1a28abf838cb2556/html5/thumbnails/22.jpg)
Solving for |N|(number of examples needed to be confident of getting a good model)
Solving
|N| > [ log(1/) + log(|H|) ] / -ln(1-)
Since ≤ -log(1-) over [0,1) we get
|N| > [ log(1/) + log(|H|) ] /
(Aside: notice that this calculation assumed we could
always find a hypothesis that fits the training data)
Notice we made NO assumptions
about the prob dist of the data
(other than it does not change)
12/8/15 CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14 22
![Page 23: Today’s Topics (only on final at a high level; Sec 19.5 and Sec 18.5 readings below are ‘skim only’) 12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30,](https://reader036.fdocuments.net/reader036/viewer/2022070413/5697bfde1a28abf838cb2556/html5/thumbnails/23.jpg)
Example: Number of Instances Needed
AssumeF = 100 binary features
H = all (pure) conjuncts
[3|F| possibilities (i, use fi, use ¬ fi, or ignore fi) so log |H| = |F| log 3 ≈ |F| ]
= 0.01
= 0.01
N = [log(1/)+log(|H|)] / = 100 [log(100) + 100] ≈ 104
But how many real-world concepts are pure conjunctswith noise-free training data?
12/8/15 CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14 23
![Page 24: Today’s Topics (only on final at a high level; Sec 19.5 and Sec 18.5 readings below are ‘skim only’) 12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30,](https://reader036.fdocuments.net/reader036/viewer/2022070413/5697bfde1a28abf838cb2556/html5/thumbnails/24.jpg)
Agnostic Learning
• So far we’ve assumed we knew the concept class - but that is unrealistic on real-world data
• In agnostic learning we relax this assumption
• We instead aim to find a hypothesis arbitrarily close (ie < error) to the best* hypothesis in our hypothesis space
• We now need |N| ≥ [ log(1/) + log(|H|) ] / 22
(denominator had been just before)* ie, closest to the true
concept
12/8/15 CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14 24
![Page 25: Today’s Topics (only on final at a high level; Sec 19.5 and Sec 18.5 readings below are ‘skim only’) 12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30,](https://reader036.fdocuments.net/reader036/viewer/2022070413/5697bfde1a28abf838cb2556/html5/thumbnails/25.jpg)
Two Senses of Complexity
Sample complexity (number of examples needed)
vs.
Time complexity (time needed to find h H that is consistent with the training examples)
- in CS, we usually only address time complexity12/8/15 CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14 25
![Page 26: Today’s Topics (only on final at a high level; Sec 19.5 and Sec 18.5 readings below are ‘skim only’) 12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30,](https://reader036.fdocuments.net/reader036/viewer/2022070413/5697bfde1a28abf838cb2556/html5/thumbnails/26.jpg)
Complexity (cont.)
– Some concepts require a polynomial number of examples, but an exponential amount of time (in the worst case)
– Eg, optimally training neural networks is NP-hard (recall BP is a ‘greedy’ algorithm that finds a local min)
12/8/15 CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14 26
![Page 27: Today’s Topics (only on final at a high level; Sec 19.5 and Sec 18.5 readings below are ‘skim only’) 12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30,](https://reader036.fdocuments.net/reader036/viewer/2022070413/5697bfde1a28abf838cb2556/html5/thumbnails/27.jpg)
Some Other COLT Topics
COLT + clustering + k-NN+ RL
+ SVMs+ ANNs+ ILP, etc.
• Average case analysis (vs. worst case)
• Learnability of natural languages (language innate?)
• Learnability in parallel
12/8/15 CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14 27
![Page 28: Today’s Topics (only on final at a high level; Sec 19.5 and Sec 18.5 readings below are ‘skim only’) 12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30,](https://reader036.fdocuments.net/reader036/viewer/2022070413/5697bfde1a28abf838cb2556/html5/thumbnails/28.jpg)
Summary of COLT
Strengths
• Formalizes learning task
• Allows for imperfections (eg, and in PAC)
• Work on boosting excellent case of ML theory influencing ML practice
• Shows what concepts are intrinsically hard to learn (eg, k-term DNF*)
* though a superset of this class is PAC learnable!
12/8/15 CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14 28
![Page 29: Today’s Topics (only on final at a high level; Sec 19.5 and Sec 18.5 readings below are ‘skim only’) 12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30,](https://reader036.fdocuments.net/reader036/viewer/2022070413/5697bfde1a28abf838cb2556/html5/thumbnails/29.jpg)
Summary of COLT
Weaknesses
• Most analyses are worst case
• Hence, bounds often much higher than what works in practice (see Domingos article assigned early this semester)
• Use of ‘prior knowledge’ not captured very well yet
12/8/15 CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14 29