Decision Tree - ID3

22
Decision Tree(ID3) Xueping Peng [email protected]

Transcript of Decision Tree - ID3

Page 1: Decision Tree - ID3

Decision Tree(ID3)Xueping Peng

[email protected]

Page 2: Decision Tree - ID3

Outline What is decision tree How to Use Decision Tree How to Generate a Decision Tree Sum Up and Some Drawbacks

Page 3: Decision Tree - ID3

What is decision tree(1/3)

Decision tree is a hierarchical tree structure that used to classify classes based on a series of questions (or rules) about the attributes of the class.

The attributes of the classes can be any type of variables from binary, nominal, ordinal, and quantitative values.

The classes must be qualitative type (categorical or binary, or ordinal).

In short, given a data of attributes together with its classes, a decision tree produces a sequence of rules (or series of questions) that can be used to recognize the class.

Page 4: Decision Tree - ID3

What is decision tree(2/3)Attributes Classes

Gender Car Ownership Travel Cost ($)/km

Income Level Transportation Mode

Male 0 Cheap Low Bus

Male 1 Cheap Medium Bus

Female 1 Cheap Medium Train

Female 0 Cheap Low Bus

Male 1 Cheap Medium Bus

Female 0 Standard Medium Train

Female 1 Standard Medium Train

Female 1 Expensive High Car

Male 2 Expensive Medium Car

Female 2 Expensive High Car

Page 5: Decision Tree - ID3

What is decision tree(3/3)

Page 6: Decision Tree - ID3

How to Use Decision Tree

Person Name Gender Car Ownership

Travel Cost ($)/km Income Level Transportation Level

Alex Male 1 Standard High ?

Buddy Male 0 Cheap Medium ?

Cherry Female 1 Cheap High ?

Test Data

What transportation mode would Alex, Buddy and Cheery use?AlexBudd

yCherry

Page 7: Decision Tree - ID3

How to Generate a Decision Tree(1/13) Description of ID3

Page 8: Decision Tree - ID3

How to Generate a Decision Tree(2/13)

Which is the best choice? We have 29 positive examples and 35 negative ones Should I use attribute 1 or attribute 2 in this iteration of the node?

Page 9: Decision Tree - ID3

How to Generate a Decision Tree(3/13)

Use Entropy to Measure Degree of Impurity Entropy

All above formulas contain values of probability of Pj a class j.

Page 10: Decision Tree - ID3

How to Generate a Decision Tree(4/13) What does Entropy mean?

Entropy is the minimum number of bits needed to encode the classification of a member of S randomly drawn. P+ = 1, the receiver knows the class, no message sent, Entropy=0. P+ = 0.5, 1 bit needed.

Optimal length code assigns –log2p to message having probability p The idea behind is to assign shorter codes to the more probable

messages and longer codes to less likely examples Thus,the expected number of bits to encode + or – of random

member of S is: H(S) = p+ (-log2 p+) + p-(-log2 p-)

Page 11: Decision Tree - ID3

How to Generate a Decision Tree(5/13)

Information Gain Measures the expected reduction in entropy caused by

partitioning the examples according to the given attribute

IG(S|A): the number of bits saved when encoding the target value of an arbitrary member of S, knowing the value of attribute A.

Expected reduction in entropy caused by knowing the value of A IG(S|A) = H(S) – Σj Prob(A=vj) H(Y | A = vj)

Page 12: Decision Tree - ID3

How to Generate a Decision Tree(6/13)

Which is the best choice? We have 29 positive examples and 35 negative ones Should I use attribute 0 or attribute 2 in this iteration of the node?

IG(A1) = 0.993 – 26/64*0.70 – 36/64*0.74 = 0.292IG(A2) = 0.993 – 51/64*0.93 – 13/64*0.61 = 0.128

Page 13: Decision Tree - ID3

How to Generate a Decision Tree(7/13)

Specific Conditional Entropy H(Y|X=v) Y is class, X is attribute and v is value of X H(Y |X=v) = The entropy of Y among only those records in which X has value v H(Class|Travel Cost=Cheap) =-0.8*log20.8 - 0.2*log20.2 = 0.722

H(Class|Travel Cost=Expensive) =-1*log21 = 0

H(Class|Travel Cost=Standard) =-1*log21 = 0

Page 14: Decision Tree - ID3

How to Generate a Decision Tree(8/13)

Conditional Entropy H(Y|X) H(Y |X) = The average specific conditional entropy of Y=

Σj Prob(X=vj) H(Y | X = vj)

e.g. H(Class|Travel Cost) = prob(Travel Cost=Cheap) * H(Class|Travel Cost=Cheap) + prob(Travel Cost=Expensive) * H(Class|Travel Cost=Expensive) +prob(Travel Cost=Standard) * H(Class|Travel Cost=Standard)

= 0.5 * 0.722 + 0.2 * 0 + 0.3 * 0 = 0.361

Page 15: Decision Tree - ID3

How to Generate a Decision Tree(9/13)

Information Gain IG(Y|X) IG(Y|X) = H(Y) - H(Y | X) e.g.

H(Class) = – 0.4 log2 (0.4) – 0.3 log2 (0.3) – 0.3 log2 (0.3) = 1.571 IG(Class|Travel Cost) = H(Class) – H(Class|Travel Cost)

1.571 – 0.361 = 1.210

Results of first iterationGain Gender Car

OwnershipTravel Cost ($)/km Income Level

IG 0.125 0.534 1.210 0.695

Page 16: Decision Tree - ID3

How to Generate a Decision Tree(10/13)

Root Node

Split Node

Page 17: Decision Tree - ID3

How to Generate a Decision Tree(11/13)

Second Iteration

Page 18: Decision Tree - ID3

How to Generate a Decision Tree(12/13)

Results of Second Iteration

Split Node

Update Decision Tree

Gain Gender Car Ownership

Income Level

IG 0.322 0.171 0.171

Page 19: Decision Tree - ID3

How to Generate a Decision Tree(13/13)

Third Iteration

Update Decision Tree

Page 20: Decision Tree - ID3

To Sum Up ID3 is a strong system that

Uses hill-climbing search based on the information gain measure to search through the space of decision trees

Outputs a single hypothesis Never backtracks.It converges to locally optimal solutions Uses all training examples at each step, contrary to methods that

make decisions incrementally Uses statistical properties of all examples:the search is less

sensitive to errors in individual training examples

Page 21: Decision Tree - ID3

Some Drawbacks It can only deal with nominal data It may be not robust in presence of noise It is not able to deal with noisy data sets