Decision Tree Algorithms Rule Based Suitable for automatic generation.

Decision Tree Algorithms

Rule Based

Suitable for automatic generation

McGraw-Hill/Irwin ©2007 The McGraw-Hill Companies, Inc. All rights reserved

8-2

Decision trees• Logical branching• Historical:

– ID3 – early rule- generating system

• Branches:– Different possible

values• Nodes:

– From which branches emanate


8-3

Goal-Driven Data Mining

• Define goal– Identify fraudulent cases

• Develop rules identifying attributes attaining that goal– IF attorney = Smith, THEN better check


8-4

Tree Structure• Sorts out data

– IF THEN rules– Loan variables

• Age: {young, middle, old}• Income: {low, average, high}• Risk: {low, medium, high}

• Exhaustive tree enumerates all combinations– 81 combinations – classify all


8-5

Types of Trees

• Classification tree– Variable values classes– Finite conditions

• Regression tree– Variable values continuous numbers– Prediction or estimation


8-6

Rule Induction• Automatically process data

– Classification (logical, easier)– Regression (estimation, messier)

• Search through data for patterns & relationships– Pure knowledge discovery

• Assumes no prior hypothesis• Disregards human judgment


8-7

Example• Three variables:

– Age– Income– Risk

• Outcomes:– On-time– Late


8-8

CombinationsVariable Value Cases OT Late Pr(OT)Age Young 12 8 4 0.67

Middle 5 4 1 0.80Old 3 3 0 1.00

Income Low 5 3 2 0.60Average 9 7 2 0.78High 6 5 1 0.83

Risk High 9 5 4 0.55Average 1 0 1 0.00Low 10 10 0 1.00


8-9

Basis for Classification

• If a category has all outcomes of a certain kind, that makes a good rule– IF income = High, they always paid

• ENTROPY: Measure of content – Actually measure of randomness


8-10

Entropy formulaInformation = -{p/(p+n)}log2 {p/(p+n)}-{n/(p+n)}log2 {n/(p+n)}

The lower the measure, the greater the information content

Can use to automatically select variable with most productive rule potential


8-11

Entropy• Young

- 8/12 x -0.390 – 4/12 x -0.528 x 12/20: 0.551

• Middle- 4/5 x -0.258 – 1/5 x -0.464 x 5/20: 0.180

• Old- 3/3 x 0 – 0/3 x 0 x 3/20: 0.000

SUM 0.731Income 0.782Risk 0.446


8-12

Rule

1. IF(Risk = Low) THEN OT2. ELSE LATE


8-13

All Rules

1. IF Risk=Low OT2. IF Risk NOT Low & Age=Middle Late3. IF Risk NOT Low & Age NOT Middle &

Income=High Late4. ELSE OT


8-14

Sample Case

• Age 36 Middle• Income $70K/year Average• Risk:

– Assets $42K– Debts $40K– Wants $5K Average

• Rule 2 applies, says Late


8-15

Fuzzy Decision Trees

• Have assumed distinct (crisp) outcomes• Many data points not that clear• Fuzzy: Membership function represents

belief (between 0 and 1)• Fuzzy relationships have been

incorporated in decision tree algorithms


8-16

Fuzzy ExampleAge Young 0.3 Middle 0.9 Old 0.2Income Low 0.0 Average 0.8 High 0.3Risk Low 0.1 Average 0.8 High 0.3• Definitions:

– Sum will not necessarily equal 1.0– If ambiguous, select alternative with larger

membership value– Aggregate with mean


8-17

Fuzzy Model• IF Risk=Low Then OT

– Membership function: 0.1• IF Risk NOT Low & Age=Middle Then Late

– Risk MAX(0.8, 0.3)– Age 0.9– Membership function: Mean = 0.85


8-18

Fuzzy Model cont.

• IF Risk NOT Low & Age NOT Middle & Income=High THEN Late– Risk MAX(0.8, 0.3) 0.8– Age MAX(0.3, 0.2) 0.3– Income 0.3– Membership function: Mean = 0.433


8-19

Fuzzy Model cont.

• IF Risk NOT Low & Age NOT Middle & Income NOT High THEN Late– Risk MAX(0.8, 0.3) 0.8– Age MAX(0.3, 0.2) 0.3– Income MAX(0.0, 0.8) 0.8– Membership function: Mean = 0.633


8-20

Fuzzy Model cont.

• Highest membership function is 0.633, for Rule 4

• Conclusion: On-time


8-21

Applications

• Inventory Prediction• Clinical Databases• Software Development Quality


8-22

Inventory Prediction• Groceries

– Maybe over 100,000 SKUs– Barcode data input

• Data mining to discover patterns– Random sample of over 1.6 million records– 30 months– 95 outlets– Test sample 400,000 records

• Rule induction more workable than regression– 28,000 rules– Very accurate, up to 27% improvement


8-23

Clinical Database• Headache

– Over 60 possible causes• Exclusive reasoning uses negative rules

– Use when symptom absent• Inclusive reasoning uses positive rules• Probabilistic rule induction expert system

– Headache: Training sample over 50,000 cases, 45 classes, 147 attributes

– Meningitis: 1200 samples on 41 attributes, 4 outputs


8-24

Clinical Database• Used AQ15, C4.5

– Average accuracy 82%• Expert System

– Average accuracy 92%• Rough Set Rule System

– Average accuracy 70%• Using both positive & negative rules from

rough sets– Average accuracy over 90%


8-25

Software Development Quality• Telecommunications company• Goal: find patterns in modules being

developed likely to contain faults discovered by customers– Typical module several million lines of code– Probability of fault averaged 0.074

• Apply greater effort for those– Specification, testing, inspection


8-26

Software Quality• Preprocessed data• Reduced data• Used CART

– (Classification & Regression Trees)– Could specify prior probabilities

• First model 9 rules, 6 variables– Better at cross-validation– But variable values not available until late

• Second model 4 rules, 2 variables– About same accuracy, data available earlier


8-27

Decision Trees

• Very effective & useful• Automatic machine learning

– Thus unbiased (but omit judgment)• Can handle very large data sets

– Not affected much by missing data• Lots of software available

Decision Tree Algorithms Rule Based Suitable for automatic generation.

Documents

Transcript of Decision Tree Algorithms Rule Based Suitable for automatic generation.