Classification & Regression - University of Notre Damerjohns15/cse40647.sp14/www... · Data...
Transcript of Classification & Regression - University of Notre Damerjohns15/cse40647.sp14/www... · Data...
Data Preprocessing
Classification& Regression
Rule-Based Classifiers
• Technique for classifying records using a collection of “if…then…” rules
• Rule set 𝑅 = (𝑟1 ∨ 𝑟2 ∨ ⋯∨ 𝑟𝑘)– 𝑟𝑖
′𝑠 𝑎𝑟𝑒 𝑡ℎ𝑒 𝑐𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑐𝑎𝑡𝑖𝑜𝑛 𝑟𝑢𝑙𝑒𝑠 𝑜𝑟 𝑑𝑖𝑠𝑗𝑢𝑛𝑐𝑡𝑠
2
𝑟1: 𝐺𝑖𝑣𝑒𝑠 𝐵𝑖𝑟𝑡ℎ = 𝑛𝑜 ∧ 𝐴𝑒𝑟𝑖𝑎𝑙 𝐶𝑟𝑒𝑎𝑡𝑢𝑟𝑒 = 𝑦𝑒𝑠 ⟶ 𝑩𝒊𝒓𝒅𝒔𝑟2: 𝐺𝑖𝑣𝑒𝑠 𝐵𝑖𝑟𝑡ℎ = 𝑛𝑜 ∧ 𝐴𝑞𝑢𝑎𝑡𝑖𝑐 𝐶𝑟𝑒𝑎𝑡𝑢𝑟𝑒 = 𝑦𝑒𝑠 ⟶ 𝑭𝒊𝒔𝒉𝒆𝒔𝑟3: 𝐺𝑖𝑣𝑒𝑠 𝐵𝑖𝑟𝑡ℎ = 𝑦𝑒𝑠 ∧ 𝐵𝑜𝑑𝑦 𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒 = 𝑤𝑎𝑟𝑚 − 𝑏𝑙𝑜𝑜𝑑𝑒𝑑 ⟶ 𝑴𝒂𝒎𝒎𝒂𝒍𝒔𝑟4: 𝐺𝑖𝑣𝑒𝑠 𝐵𝑖𝑟𝑡ℎ = 𝑛𝑜 ∧ 𝐴𝑒𝑟𝑖𝑎𝑙 𝐶𝑟𝑒𝑎𝑡𝑢𝑟𝑒 = 𝑛𝑜 ⟶ 𝑹𝒆𝒑𝒕𝒊𝒍𝒆𝒔𝑟5: 𝐴𝑞𝑢𝑎𝑡𝑖𝑐 𝐶𝑟𝑒𝑎𝑡𝑢𝑟𝑒 = 𝑠𝑒𝑚𝑖 ⟶ 𝑨𝒎𝒑𝒉𝒊𝒃𝒊𝒂𝒏𝒔
Data Preprocessing
Classification& Regression
Classification Rules
• Each rule is expressed in the following way:𝒓𝒊: (𝑪𝒐𝒏𝒅𝒊𝒕𝒊𝒐𝒏) ⟶ 𝒚𝒊
– Left-hand side is called rule antecedent or precondition
𝐶𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛 = 𝐴1𝑜𝑝 𝑣1 ∧ 𝐴2 𝑜𝑝 𝑣2 ∧ ⋯(𝐴𝑘 𝑜𝑝 𝑣𝑘)
– op is chosen from the set {=, ≠,<,>,≤, ≥}
– 𝒚𝒊 is called the rule consequent, which contains the predicted class 𝑦𝑖
• A rule r covers a record x if its precondition matches the attributes of x
3
Data Preprocessing
Classification& Regression
Classification Rules
4
𝑟1: 𝐺𝑖𝑣𝑒𝑠 𝐵𝑖𝑟𝑡ℎ = 𝑛𝑜 ∧ 𝐴𝑒𝑟𝑖𝑎𝑙 𝐶𝑟𝑒𝑎𝑡𝑢𝑟𝑒 = 𝑦𝑒𝑠 ⟶ 𝑩𝒊𝒓𝒅𝒔𝑟2: 𝐺𝑖𝑣𝑒𝑠 𝐵𝑖𝑟𝑡ℎ = 𝑛𝑜 ∧ 𝐴𝑞𝑢𝑎𝑡𝑖𝑐 𝐶𝑟𝑒𝑎𝑡𝑢𝑟𝑒 = 𝑦𝑒𝑠 ⟶ 𝑭𝒊𝒔𝒉𝒆𝒔𝑟3: 𝐺𝑖𝑣𝑒𝑠 𝐵𝑖𝑟𝑡ℎ = 𝑦𝑒𝑠 ∧ 𝐵𝑜𝑑𝑦 𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒 = 𝑤𝑎𝑟𝑚 − 𝑏𝑙𝑜𝑜𝑑𝑒𝑑 ⟶ 𝑴𝒂𝒎𝒎𝒂𝒍𝒔𝑟4: 𝐺𝑖𝑣𝑒𝑠 𝐵𝑖𝑟𝑡ℎ = 𝑛𝑜 ∧ 𝐴𝑒𝑟𝑖𝑎𝑙 𝐶𝑟𝑒𝑎𝑡𝑢𝑟𝑒 = 𝑛𝑜 ⟶ 𝑹𝒆𝒑𝒕𝒊𝒍𝒆𝒔𝑟5: 𝐴𝑞𝑢𝑎𝑡𝑖𝑐 𝐶𝑟𝑒𝑎𝑡𝑢𝑟𝑒 = 𝑠𝑒𝑚𝑖 ⟶ 𝑨𝒎𝒑𝒉𝒊𝒃𝒊𝒂𝒏𝒔
NameBody
TemperatureSkin
CoverGives Birth
Aquatic Creature
Aerial Creature
Has Legs Hibernates
Hawk Warm-blooded Feather No No Yes Yes No
Grizzly Bear Warm-blooded Fur Yes No No Yes Yes
• 𝑟1 covers the first vertebrate
• Is the second instance covered?
Data Preprocessing
Classification& Regression
Evaluating Rules
5
• Given a dataset D and a classification rule 𝑟: 𝐴 → 𝑦, we can evaluate it based on the following two metrics:
𝐶𝑜𝑣𝑒𝑟𝑎𝑔𝑒 𝑟 =𝐴
𝐷
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 𝑟 =𝐴 ∩ 𝑦
𝐴
𝑨 : Number of records that satisfy the antecedent
𝑨 ∩ 𝒚 : number of records that satisfy both antecedent and consequent
𝑫 : Total number of records
Data Preprocessing
Classification& Regression
Example
6
𝑂𝑢𝑡𝑙𝑜𝑜𝑘 = 𝑠𝑢𝑛𝑛𝑦 ∧ 𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒 ≥ 80 ⟶ 𝑛𝑜
𝐶𝑜𝑣𝑒𝑟𝑎𝑔𝑒 = 50%𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 100%
Data Preprocessing
Classification& Regression
How a Rule-Based Classifier Works
7
• Lemur triggers 𝑟3 and is classified as a mammal
• Turtle triggers 𝑟4 and 𝑟5 which are conflicting. Issue must be resolved.
• No rules are triggered by the shark, but we must ensure that the classifier can still make a reliable prediction.
𝑟1: 𝐺𝑖𝑣𝑒𝑠 𝐵𝑖𝑟𝑡ℎ = 𝑛𝑜 ∧ 𝐴𝑒𝑟𝑖𝑎𝑙 𝐶𝑟𝑒𝑎𝑡𝑢𝑟𝑒 = 𝑦𝑒𝑠 ⟶ 𝑩𝒊𝒓𝒅𝒔𝑟2: 𝐺𝑖𝑣𝑒𝑠 𝐵𝑖𝑟𝑡ℎ = 𝑛𝑜 ∧ 𝐴𝑞𝑢𝑎𝑡𝑖𝑐 𝐶𝑟𝑒𝑎𝑡𝑢𝑟𝑒 = 𝑦𝑒𝑠 ⟶ 𝑭𝒊𝒔𝒉𝒆𝒔𝑟3: 𝐺𝑖𝑣𝑒𝑠 𝐵𝑖𝑟𝑡ℎ = 𝑦𝑒𝑠 ∧ 𝐵𝑜𝑑𝑦 𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒 = 𝑤𝑎𝑟𝑚 − 𝑏𝑙𝑜𝑜𝑑𝑒𝑑 ⟶ 𝑴𝒂𝒎𝒎𝒂𝒍𝒔𝑟4: 𝐺𝑖𝑣𝑒𝑠 𝐵𝑖𝑟𝑡ℎ = 𝑛𝑜 ∧ 𝐴𝑒𝑟𝑖𝑎𝑙 𝐶𝑟𝑒𝑎𝑡𝑢𝑟𝑒 = 𝑛𝑜 ⟶ 𝑹𝒆𝒑𝒕𝒊𝒍𝒆𝒔𝑟5: 𝐴𝑞𝑢𝑎𝑡𝑖𝑐 𝐶𝑟𝑒𝑎𝑡𝑢𝑟𝑒 = 𝑠𝑒𝑚𝑖 ⟶ 𝑨𝒎𝒑𝒉𝒊𝒃𝒊𝒂𝒏𝒔
NameBody
TemperatureSkin
CoverGives Birth
Aquatic Creature
Aerial Creature
Has Legs Hibernates
Lemur Warm-blooded Fur Yes No No Yes Yes
Turtle Cold-blooded Scales No Semi No Yes No
Shark Cold-blooded Scales Yes Yes No No No
Data Preprocessing
Classification& Regression
Important Properties
• Mutually Exclusive Rules: No two rule in a set of rules R are triggered by the same record. This ensures that each record triggers at most one rule.
• Exhaustive Rules: Every record is covered by at least one rule in R.
• Together, these properties ensure that every record is covered by exactly one rule.
• If a set of rules is not exhaustive, we can assign the remaining cases to a default class (usually the majority)– 𝑟𝑑: () ⟶ 𝑦𝑑
8
Data Preprocessing
Classification& Regression
Important Properties
• If a set of rules is not mutually exclusive (as in the previous example), we must resolve the conflict by one of two ways:– Ordered rules: Rules are sorted in the order of their priority and
predictions are made accordingly
– Unordered rules: Considers the consequent of each rule as a vote for a record. Votes are tallied and the class with most votes is assigned to the record. This is less susceptible to errors.
9
Data Preprocessing
Classification& Regression
Rule-Ordering Schemes
10
• Rule-based ordering– Individual rules are ranked based on their quality
• Class-based ordering– Rules that belong to the same class are grouped together
Data Preprocessing
Classification& Regression
Building Classification Rules
11
• Direct Method:– Extract rules direct from data
– E.g.: RIPPER, CN2, 1R
• Indirect Method:– Extract Rules from classification models (e.g., decision trees, neural networks, etc.)
– E.g.: C4.5rules
Data Preprocessing
Classification& Regression
Direct Methods for Rule Extraction
12
• Sequential covering:1. Start from an empty rule
2. Grow a rule using the Learn-One-Rule function
3. Remove records covered by the rule
4. Repeat (2) and (3) until stop criterion is met
Data Preprocessing
Classification& Regression
Aspects of Sequential Covering
14
• Rule growing
• Instance elimination
• Rule evaluation
• Stop criterion
• Rule pruning
Data Preprocessing
Classification& Regression
Instance Elimination
16
• Why eliminate instances?– Otherwise the next rule would be
identical to previous
• Why eliminate + instances?– Ensure the next rule is different
• Why eliminate – instances?– Prevent underestimating
accuracy of rule
Data Preprocessing
Classification& Regression
Rule Evaluation
17
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =𝑓+𝑛
𝐿𝑎𝑝𝑙𝑎𝑐𝑒 =𝑓+ + 1
𝑛 + 𝑘
𝑚 − 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒 =𝑓+ + 𝑘𝑝+𝑛 + 𝑘
𝒏: number of examples covered by rule
𝒇+: number of positive examples covered by rule
𝒌: total number of classes
𝒑+: prior probability for positive class
Data Preprocessing
Classification& Regression
Rule Evaluation
18
𝐹𝑂𝐼𝐿′𝑠 𝑖𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 𝑔𝑎𝑖𝑛 = 𝑝1 × log2𝑝1
𝑝1 + 𝑛1− log2
𝑝0𝑝0 + 𝑛0
Note that this metric is proportional to 𝑝1 and 𝑝1/(𝑝1 + 𝑛1). Therefore, it would rank rules with high support and high accuracies with a higher score.
Suppose the rule r: 𝐴 → + covers 𝒑𝟎positive records and 𝒏𝟎 negatives.
After adding a new conjunct B, the extended rule r’: 𝐴 ∧ 𝐵 → + covers 𝒑𝟏positive instances and 𝒏𝟏 negatives.
Data Preprocessing
Classification& Regression
Stopping Criterion and Rule Pruning
19
• Stop criterion:– Compute gain
– If not significant, discard the new rule
• Rule pruning:– Done to reduce error
– Remove one of the conjuncts in the rule
– Compare error on validation set before and after pruning
– If error improves, prune the conjunct
Data Preprocessing
Classification& Regression
Summary of Direct Method
20
1. Grow a single rule
2. Remove instances from rule
3. Prune the rule if needed
4. Add rule to current rule set
5. Repeat
Data Preprocessing
Classification& Regression
RIPPER
21
• For binary class problem:– Choose one of the classes as the negative
– Learn rules for positive (typically the minority) class
– Negative class will be the default class
• For multi-class problem:– Order the classes according to their frequencies
– Learn the rule set for smallest class first, treat rest as negative class
– Repeat with next smallest class as positive class
Data Preprocessing
Classification& Regression
RIPPER
22
• Growing a rule:– Start from empty rule
– Add conjuncts as long as they improve FOIL’s information gain
– Stop when rule no longer covers negative examples
– Prune rule based on the metric 𝑣 = (𝑝 − 𝑛)/(𝑝 + 𝑛)• Where 𝑝 is the number of positive examples covered in the validation set• 𝑛 is the number of negative examples covered
– If 𝑣 improves after pruning, we remove the conjunct
– Ex.: 𝐴𝐵𝐶𝐷 → 𝑦, check if 𝐷 should be pruned, followed by 𝐶𝐷, 𝐵𝐶𝐷
Data Preprocessing
Classification& Regression
RIPPER
23
• Building a Rule Set:– After generating rule, remove all positive and negative examples
covered by it
– Add rule to rule set as long as it does not violate stopping conditions
– If rule increases description length* of rule set by 𝑑 bits, RIPPER stops adding rules (𝑑 is set to 64 bits by default)
– Another stopping condition: error rate must not exceed 50%
Description length = number of bits needed to code current rule set, and their exceptions