Decision Tree - ID3

Decision Tree(ID3)Xueping Peng

Xueping.peng@uts.edu.au

Outline What is decision tree How to Use Decision Tree How to Generate a Decision Tree Sum Up and Some Drawbacks

What is decision tree(1/3)

Decision tree is a hierarchical tree structure that used to classify classes based on a series of questions (or rules) about the attributes of the class.

The attributes of the classes can be any type of variables from binary, nominal, ordinal, and quantitative values.

The classes must be qualitative type (categorical or binary, or ordinal).

In short, given a data of attributes together with its classes, a decision tree produces a sequence of rules (or series of questions) that can be used to recognize the class.

What is decision tree(2/3)Attributes Classes

Gender Car Ownership Travel Cost ($)/km

Income Level Transportation Mode

Male 0 Cheap Low Bus

Male 1 Cheap Medium Bus

Female 1 Cheap Medium Train

Female 0 Cheap Low Bus

Male 1 Cheap Medium Bus

Female 0 Standard Medium Train

Female 1 Standard Medium Train

Female 1 Expensive High Car

Male 2 Expensive Medium Car

Female 2 Expensive High Car

What is decision tree(3/3)

How to Use Decision Tree

Person Name Gender Car Ownership

Travel Cost ($)/km Income Level Transportation Level

Alex Male 1 Standard High ?

Buddy Male 0 Cheap Medium ?

Cherry Female 1 Cheap High ?

Test Data

What transportation mode would Alex, Buddy and Cheery use?AlexBudd

yCherry

How to Generate a Decision Tree(1/13) Description of ID3

How to Generate a Decision Tree(2/13)

Which is the best choice? We have 29 positive examples and 35 negative ones Should I use attribute 1 or attribute 2 in this iteration of the node?

Use Entropy to Measure Degree of Impurity Entropy

All above formulas contain values of probability of Pj a class j.

How to Generate a Decision Tree(4/13) What does Entropy mean?

Entropy is the minimum number of bits needed to encode the classification of a member of S randomly drawn. P+ = 1, the receiver knows the class, no message sent, Entropy=0. P+ = 0.5, 1 bit needed.

Optimal length code assigns –log2p to message having probability p The idea behind is to assign shorter codes to the more probable

messages and longer codes to less likely examples Thus,the expected number of bits to encode + or – of random

member of S is: H(S) = p+ (-log2 p+) + p-(-log2 p-)

Information Gain Measures the expected reduction in entropy caused by

partitioning the examples according to the given attribute

IG(S|A): the number of bits saved when encoding the target value of an arbitrary member of S, knowing the value of attribute A.

Expected reduction in entropy caused by knowing the value of A IG(S|A) = H(S) – Σj Prob(A=vj) H(Y | A = vj)

Which is the best choice? We have 29 positive examples and 35 negative ones Should I use attribute 0 or attribute 2 in this iteration of the node?

IG(A1) = 0.993 – 26/64*0.70 – 36/64*0.74 = 0.292IG(A2) = 0.993 – 51/64*0.93 – 13/64*0.61 = 0.128

Specific Conditional Entropy H(Y|X=v) Y is class, X is attribute and v is value of X H(Y |X=v) = The entropy of Y among only those records in which X has value v H(Class|Travel Cost=Cheap) =-0.8*log20.8 - 0.2*log20.2 = 0.722

H(Class|Travel Cost=Expensive) =-1*log21 = 0

H(Class|Travel Cost=Standard) =-1*log21 = 0

Conditional Entropy H(Y|X) H(Y |X) = The average specific conditional entropy of Y=

Σj Prob(X=vj) H(Y | X = vj)

e.g. H(Class|Travel Cost) = prob(Travel Cost=Cheap) * H(Class|Travel Cost=Cheap) + prob(Travel Cost=Expensive) * H(Class|Travel Cost=Expensive) +prob(Travel Cost=Standard) * H(Class|Travel Cost=Standard)

= 0.5 * 0.722 + 0.2 * 0 + 0.3 * 0 = 0.361

Information Gain IG(Y|X) IG(Y|X) = H(Y) - H(Y | X) e.g.

H(Class) = – 0.4 log2 (0.4) – 0.3 log2 (0.3) – 0.3 log2 (0.3) = 1.571 IG(Class|Travel Cost) = H(Class) – H(Class|Travel Cost)

1.571 – 0.361 = 1.210

Results of first iterationGain Gender Car

OwnershipTravel Cost ($)/km Income Level

IG 0.125 0.534 1.210 0.695

Root Node

Split Node

Second Iteration

Results of Second Iteration

Split Node

Update Decision Tree

Gain Gender Car Ownership

Income Level

IG 0.322 0.171 0.171

Third Iteration

Update Decision Tree

To Sum Up ID3 is a strong system that

Uses hill-climbing search based on the information gain measure to search through the space of decision trees

Outputs a single hypothesis Never backtracks.It converges to locally optimal solutions Uses all training examples at each step, contrary to methods that

make decisions incrementally Uses statistical properties of all examples:the search is less

sensitive to errors in individual training examples

Some Drawbacks It can only deal with nominal data It may be not robust in presence of noise It is not able to deal with noisy data sets

References Tutorial on Decision Tree,

http://people.revoledu.com/kardi/tutorial/DecisionTree/index.html

Information Gain, http://www.autonlab.org/tutorials/infogain11.pdf

http://www.slideshare.net/aorriols/lecture5-c45

Decision Tree - ID3

Documents

Transcript of Decision Tree - ID3

Decision Trees Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting.

Data Mining – Algorithms: Decision Trees - ID3

Overcast - ai.vub.ac.be · Decision T ree Learning [read Chapter 3] [recommended exercises 3.1, 3.4] Decision tree represen tation ID3 learning algorithm En trop y, Information gain

Decision(Tree(Learning - Indian Statistical Institutedebapriyo/teaching/datamining2014/slides/... · – Decision tree building by Gini’s index ! Iterative Dichotomizer 3 (ID3)

CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.

1 K-Means+ID3 A Novel Method for Supervised Anomaly Detection by Cascading K-Means Clustering and ID3 Decision Tree Learning Methods Author : Shekhar R.

Decision Tree Learning - ID3. Decision tree examples ID3 algorithm Occam Razor Top-Down Induction in Decision Trees Information Theory gain from property.

Decision Trees - Penn Engineering · 2019-01-22 · Basic Algorithm for Top-Down Induction of Decision Trees [ID3, C4.5 by Quinlan] node = root of decision tree Main loop: 1. Aßthe

Babu Ram Dawadi1 Decision Tree: Outline Decision tree representation Decision tree representation ID3 learning algorithm ID3 learning algorithm Entropy,

Probabilistic Model for Code with Decision Trees · 2019-04-25 · sion tree learning algorithms such as ID3, but also to obtain new variants we refer to as ID3+ and E13, not previously

A comparison of ID3 and backpropagation for English text ... · ID3 is a simple decision-tree learning algorithm developed by Ross Quinlan (1983, 1986b). It constructs a decision

ResearchArticle - Hindawi Publishing Corporationdownloads.hindawi.com/journals/wcmc/2018/2385150.pdf · ResearchArticle Securely Outsourcing ID3 Decision Tree in Cloud Computing YeLi,1

Part 7.3 Decision Trees Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting.

1 Ch 3. Decision Tree Learning Decision trees Basic learning algorithm (ID3) Entropy, information gain Hypothesis space Inductive bias Occam’s.

PERBANDINGAN 3 METODE DALAM DATA MININGeprints.ums.ac.id/39922/20/02. NASKAH PUBLIKASI_VERONICA_1.pdf · Kata kunci : Algoritma ID3, Data mining, Decision Tree, Naive Bayes, Regresi

Deployment of ID3 decision tree algorithm for placement prediction

Penerapan Teknik Klasifikasi Menggunakan Metode Fuzzy Decision Tree Dengan Algoritma Id3 Pada Data Diabetes

Decision Tree Learning - stat.ncsu.edudickey/Analytics/Datamine/Reference Papers... · ID3 learning algorithm Statistical measures in decision tree learning: Entropy, Information

Les arbres de décision (decision trees)helios.mi.parisdescartes.fr/~lomn/Cours/DM/Material/ComplementsCours/... · • Ex d'algorithme: ID3 (Inductive Decision Tree) et son successeur

1 Machine Learning: Symbol-based 9d 9.0Introduction 9.1A Framework for Symbol-based Learning 9.2Version Space Search 9.3The ID3 Decision Tree Induction.