Data Mining – Algorithms: Prism – Learning Rules via Separating and Covering
-
Upload
yetta-benjamin -
Category
Documents
-
view
22 -
download
1
description
Transcript of Data Mining – Algorithms: Prism – Learning Rules via Separating and Covering
Rules
• Can be directly read off a decision tree – but those might not be the most compact or effective rules
• Common approach – take each class in turn and find a way of “covering” all instances in it, while excluding instances not in the class
Let’s use My Weather Data Again• Again, Let’s take this a little more realistic than book does• Divide into training and test data• Let’s save the last record as a test• (using my weather, nominal … and assuming we’re
working on the play?=yes class first …• We’re looking for a rule in the form
if ___ Then play? = yes• Possible ways of filling include:
– Outlook = sunny– Outlook = overcast– …– Temperature = hot– …
Find the best filler using training data• We look at proportion of instances that match the
left hand side that also match the right hand sideLHS Matches LHS Of those, Match
RHSRatio
Outlook = sunny 5 4 .80
Outlook = Overcast 4 2 .50
Outlook = Rainy 4 0 .00
Temp = Hot 4 1 .25
Temp = Mild 5 3 .60
Temp = Cool 4 2 .50
Humid = High 6 3 .50
Humid = Normal 7 3 .43
Windy = TRUE 5 4 .80
Windy = False 8 2 .25
Refining Rule• If this rule is not accurate enough for us (based on a
threshold), we’re going to try to refine it by adding a clause(s)
• Now, we’re looking to fill in a clause in the following:
if outlook = sunny and _____ then play? = yes
• We consider the accuracy of all possible ways of filling this blank …
Find the best filler using training data• We look at proportion of instances that match the
left hand side that also match the right hand sideLHS Matches LHS Of those, Match
RHSRatio
Outlook = Sunny & Temp = Hot 2 1 .50
Outlook = Sunny & Temp = Mild 2 2 1.00
Outlook = Sunny & Temp = Cool 1 1 1.00
Outlook = Sunny & Humid = High 3 2 .67
Outlook = Sunny & Humid = Normal 2 2 1.00
Outlook = Sunny & Windy = TRUE 2 2 1.00
Outlook = Sunny & Windy = False 3 2 .67
Still more to cover though
• This rule only covers 2 of the 6 play=yes days– This approach looks more for pockets of a success
whereas ID3 is looking more at the big picture
• So we temporarily toss those 2 instances and work on another rule
Example: My Weather (Nominal)Outlook Temp Humid Windy Play?sunny hot high FALSE nosunny hot high TRUE yesovercast hot high FALSE norainy mild high FALSE norainy cool normal FALSE norainy cool normal TRUE noovercast cool normal TRUE yes
sunny cool normal FALSE yesrainy mild normal FALSE no
overcast mild high TRUE yesovercast hot normal FALSE norainy mild high TRUE no TEST
We’re Looking for another rule …• in the form
if ___ Then play? = yes
• Again, possible ways of filling include:– Outlook = sunny– Outlook = overcast– …– Temperature = hot– …
• However, our data is a little different now
Find the best filler using training data• We look at proportion of instances that match the
left hand side that also match the right hand sideLHS Matches LHS Of those, Match
RHSRatio
Outlook = sunny 3 2 .67
Outlook = Overcast 4 2 .50
Outlook = Rainy 4 0 .00
Temp = Hot 4 1 .25
Temp = Mild 3 1 .33
Temp = Cool 4 2 .50
Humid = High 5 2 .40
Humid = Normal 6 2 .33
Windy = TRUE 4 3 .75
Windy = False 7 1 .14
Refining Rule• If this rule is not accurate enough for us (based on a
threshold), we’re going to try to refine it by adding a clause(s)
• Now, we’re looking to fill in a clause in the following:
if windy = TRUE and _____ then play? = yes
• We consider the accuracy of all possible ways of filling this blank …
Find the best filler using training data• We look at proportion of instances that match the
left hand side that also match the right hand sideLHS Matches LHS Of those, Match
RHSRatio
Windy = TRUE & Outlook = sunny 1 1 1.00
Windy = TRUE & Outlook = Overcast 2 2 1.00
Windy = TRUE & Outlook = Rainy 1 0 .00
Windy = TRUE & Temp = Hot 1 1 1.00
Windy = TRUE & Temp = Mild 1 1 1.00
Windy = TRUE & Temp = Cool 2 1 .50
Windy = TRUE & Humid = High 2 2 1.00
Windy = TRUE & Humid = Normal 2 1 .50
Still more to cover though
• The rules so far cover 4 of the 6 play=yes days
• So we temporarily toss the 2 instances covered by the second rule and work on another rule
Example: My Weather (Nominal)Outlook Temp Humid Windy Play?sunny hot high FALSE no
overcast hot high FALSE norainy mild high FALSE norainy cool normal FALSE norainy cool normal TRUE noovercast cool normal TRUE yes
sunny cool normal FALSE yesrainy mild normal FALSE no
overcast hot normal FALSE norainy mild high TRUE no TEST
We’re Looking for another rule …• in the form
if ___ Then play? = yes
• Again, we’ll try all possible ways of filling
• … on our reduced data
Find the best filler using training data• We look at proportion of instances that match the
left hand side that also match the right hand sideLHS Matches LHS Of those, Match
RHSRatio
Outlook = sunny 2 1 .50
Outlook = Overcast 3 1 .33
Outlook = Rainy 4 0 .00
Temp = Hot 3 0 .00
Temp = Mild 2 0 .00
Temp = Cool 4 2 .50
Humid = High 3 0 .00
Humid = Normal 6 2 .33
Windy = TRUE 2 1 .50
Windy = False 7 1 .14
Refining Rule• If this rule is not accurate enough for us (based on a
threshold – and at 50% it almost assuredly isn’t), we’re going to try to refine it by adding a clause(s)
• Now, we’re looking to fill in a clause in the following:
if temp = cool and _____ then play? = yes
• We consider the accuracy of all possible ways of filling this blank …
Find the best filler using training data• We look at proportion of instances that match the
left hand side that also match the right hand sideLHS Matches LHS Of those, Match
RHSRatio
Temp = Cool & Outlook = sunny 1 1 1.00
Temp = Cool & Outlook = Overcast 1 1 1.00
Temp = Cool & Outlook = Rainy 2 0 .00
Temp = Cool & Humid = High 0 0 ---
Temp = Cool & Humid = Normal 4 2 .50
Temp = Cool & Windy = True 2 1 .50
Temp = Cool & Windy = False 2 1 .50
So Far, We Have 3 Rules …
• if Outlook = Sunny & Temp = Mild Then Play? = yes
•If Windy = TRUE & Humid = High Then Play? = yes
•If Temp = Cool & Outlook = Sunny Then Play? = yes
• Still more to cover though
• The rules so far cover 5 of the 6 play=yes days
• So we temporarily toss the 1 instance covered by the third rule and work on another rule
Example: My Weather (Nominal)Outlook Temp Humid Windy Play?sunny hot high FALSE no
overcast hot high FALSE norainy mild high FALSE norainy cool normal FALSE norainy cool normal TRUE noovercast cool normal TRUE yes
rainy mild normal FALSE no
overcast hot normal FALSE norainy mild high TRUE no TEST
Again we’re looking for another rule …• in the form
if ___ Then play? = yes
• Again, we’ll try all possible ways of filling
• … on our reduced data
Find the best filler using training data• We look at proportion of instances that match the
left hand side that also match the right hand sideLHS Matches LHS Of those, Match
RHSRatio
Outlook = sunny 1 0 .00
Outlook = Overcast 3 1 .33
Outlook = Rainy 4 0 .00
Temp = Hot 3 0 .00
Temp = Mild 2 0 .00
Temp = Cool 3 1 .33
Humid = High 3 0 .00
Humid = Normal 5 1 .20
Windy = TRUE 2 1 .50
Windy = False 6 0 .00
Refining Rule• If this rule is not accurate enough for us (based on a
threshold – and at 50% it almost assuredly isn’t), we’re going to try to refine it by adding a clause(s)
• Now, we’re looking to fill in a clause in the following:
if Windy = True and _____ then play? = yes
• We consider the accuracy of all possible ways of filling this blank …
Find the best filler using training data• We look at proportion of instances that match the
left hand side that also match the right hand sideLHS Matches LHS Of those, Match
RHSRatio
Windy = True & Outlook = sunny 0 0 ---
Windy = True & Outlook = Overcast 1 1 1.00
Windy = True & Outlook = Rainy 1 0 .00
Windy = True & Temp = Hot 0 0 ---
Windy = True & Temp = Mild 0 0 ---
Windy = True & Temp = Cool 2 1 .50
Windy = True & Humid = High 0 0 ---
Windy = True & Humid = Normal 2 1 .50
We’ve Covered all Yes Instances
• • We Have 4 Rules …• if Outlook = Sunny & Temp = Mild Then Play? = yes
• If Windy = TRUE & Humid = High Then Play? = yes
• If Temp = Cool & Outlook = Sunny Then Play? = yes
• If Windy = TRUE & Outlook = Overcast Then Play? = yes
• It’s time to work on the next class – (remember to bring back all of the instances)– (since it is the last class, we might create a default
rule – anything else is play?=no)
Find the best filler using training data• We look at proportion of instances that match the left hand
side that also match the right hand side (play? = no)
LHS Matches LHS Of those, Match RHS
Ratio
Outlook = sunny 5 1 .20
Outlook = Overcast 4 2 .50
Outlook = Rainy 4 4 1.00
Temp = Hot 4 3 .75
Temp = Mild 5 2 .40
Temp = Cool 4 2 .50
Humid = High 6 3 .50
Humid = Normal 7 4 .57
Windy = TRUE 5 1 .20
Windy = False 8 6 .75
Still more to cover though
• This rule only covers 4 of the 7 play=no days
• So we temporarily toss those 4 instances and work on another rule
Example: My Weather (Nominal)Outlook Temp Humid Windy Play?sunny hot high FALSE nosunny hot high TRUE yesovercast hot high FALSE no
overcast cool normal TRUE yessunny mild high FALSE yessunny cool normal FALSE yes
sunny mild normal TRUE yesovercast mild high TRUE yesovercast hot normal FALSE norainy mild high TRUE noTEST
Find the best filler using training data• We look at proportion of instances that match the
left hand side that also match the right hand sideLHS Matches LHS Of those, Match
RHS (no)Ratio
Outlook = sunny 5 1 .20
Outlook = Overcast 4 2 .50
Outlook = Rainy 0 0 ---
Temp = Hot 4 3 .75
Temp = Mild 3 0 .00
Temp = Cool 2 0 .00
Humid = High 5 2 .40
Humid = Normal 4 1 .25
Windy = TRUE 4 0 .00
Windy = False 5 3 .60
Refining Rule• If this rule is not accurate enough for us (based on a
threshold), we’re going to try to refine it by adding a clause(s)
• Now, we’re looking to fill in a clause in the following:
if Temp = Hot and _____ then play? = no
• We consider the accuracy of all possible ways of filling this blank …
Find the best filler using training data• We look at proportion of instances that match the
left hand side that also match the right hand sideLHS Matches LHS Of those, Match
RHS (no)Ratio
Temp = Hot & Outlook = sunny 2 1 .50
Temp = Hot & Outlook = Overcast 2 2 1.00
Temp = Hot & Outlook = Rainy 0 0 ---
Temp = Hot & Humid = High 3 2 .67
Temp = Hot & Humid = Normal 1 1 1.00
Temp = Hot & Windy = True 1 0 .00
Temp = Hot & Windy = False 3 3 1.00
We’ve Done It!• The 2 rules so far cover all 7 of the play=no
days• So we have a 6 rule set of rules based on this
training data– if Outlook = Sunny & Temp = Mild Then Play? = yes
– If Windy = TRUE & Humid = High Then Play? = yes
– If Temp = Cool & Outlook = Sunny Then Play? = yes– If Windy = TRUE & Outlook = Overcast Then Play? = yes– If Outlook = Rainy Then Play? = no– If Temp = Hot & Windy = False Then Play? = no
• Note that the rules for a given category is considered an ordered set of rules, but between categories there is no order implied – there may be a conflict!
Now, suppose we must predict the test instance
• Rainy, mild, high, true
• Rule 2 concludes play?=yes (incorrectly)
• Rule 5 concludes play?=no (correctly)
• One possible way of dealing with this conflict is to favor the rule that has greatest coverage (most instances in support of it) in the training data
• In this case, Rule 2 has 2 instances in support, and Rule 5 has 4 instances in support
WEKA results – first look near the bottom
=== Stratified cross-validation ===
=== Summary ===
Correctly Classified Instances 12 85.7143 %
Incorrectly Classified Instances 2 14.2857%
============================================• On the cross validation – it got 12 out of 14 tests correct
• Wins BIG over other approaches tried so far!
More Detailed Results=== Confusion Matrix === a b <-- classified as 5 1 | a = yes 1 7 | b = no====================================
•Here we see –the program 6 times predicted play=yes, on 5 of those it was correct –
•The program 8 times predicted play = no, on 7 of those it was correct
•There were 6 instances whose actual value was play=yes, the program correctly predicted that on 5 of them
•There were 8 instances whose actual value was play=no, the program correctly predicted that on 7 of them
•All-in-all, uniformly good prediction
Again, part of our purpose is to have a take-home message for humans
• Not 14 take home messages!
• So instead of reporting each of the things learned on each of the 14 training sets …
• … The program runs again on all of the data and builds a pattern for that – a take home message
WEKA - Take-Home=== Classifier model (full training set) ===
Prism rules----------If outlook = sunny and temperature = mild then yesIf outlook = sunny and temperature = cool then yesIf windy = TRUE and outlook = overcast then yesIf outlook = sunny and windy = TRUE then yesIf outlook = rainy then noIf temperature = hot and windy = FALSE then no
Let’s Try WEKA Prism on njcrimenominal
• Try 10-fold
=== Confusion Matrix ===
a b <-- classified as
5 2 | a = bad
6 19 | b = ok• This represents the same accuracy as with Naïve Bayes • We note that OneR chose unemployment as the attribute to
use, with Prism, it is the first thing tested for each class, but if it is not high or low, other attributes are taken into account …
Prism’s rules for njcrimenominal:=== Classifier model (full training set) ===Prism rules
If unemploy = hi then badIf popdens = med and education = low then badIf pop = med and popdens = med then badIf unemploy = med and education = low and pop = low then badIf education = med and unemploy = med and twoparent = med then bad
If unemploy = low then okIf education = hi then okIf pop = med and popdens = low then okIf twoparent = low and unemploy = med and popdens = low then ok
Figure 4.8 Pseudo-code for Prism basic rule learner.
For each class C Initialize E to the instance set While E contains instances in class C Create a rule R with an empty left-hand side that predicts class
C Until R is perfect (or there are no more attributes to use) do For each attribute A not mentioned in R, and each value v, Consider adding the condition A=v to the LHS of R Select A and v to maximize the accuracy p/t (break ties by choosing the condition with the largest p) Add A=v to R Remove the instances covered by R from E
Prism – Numeric Values
• Prism cannot handle
• Easy to imagine a simple rule learner that could handle them (in regular attributes)– See example introducing section, where thresholds
are chosen for numeric attributes as part of adding clauses to rules
• No chance of ever handling numeric prediction
Prism – Discussion
• Prism tries to fit training data 100%
• This presents a serious risk for overfitting!!
• Simple variation is to lower accuracy threshold– May need experimentation to find suitable threshold
• Needs conflict resolution between classes if more than one class is predicted
• Needs means of dealing with if no class is predicted