Data Mining Entropy
-
Upload
abhishekgupta -
Category
Documents
-
view
26 -
download
3
description
Transcript of Data Mining Entropy
RID Age Income Student Credit_rating Buys_computer Overall1 low high no fair no p1 0.642 low high no excellent no p2 0.363 medium high no fair yes logp1 -0.644 high medium no fair yes logp2 -1.495 high low yes fair yes Entropy 0.946 high low yes excellent no7 medium low yes excellent yes8 low medium no fair no9 low low yes fair yes
10 high medium yes fair yes11 low medium yes excellent yes12 medium medium no excellent yes13 medium high yes fair yes14 high medium no excellent no
AGE IS LOW
The final tree is
no
NO
AgeLow Medium High
p1 0.40 p1 1.00 p1 0.60p2 0.60 p2 0.00 p2 0.40logp1 -1.32 logp1 0.00 logp1 -0.74logp2 -0.74 logp2 0.00 logp2 -1.32
Entropy 0.97 Entropy 0.00 Entropy 0.97
IncomeLow Medium High
p1 0.75 p1 0.67 p1 0.50p2 0.25 p2 0.33 p2 0.50logp1 -0.42 logp1 -0.58 logp1 -1.00logp2 -2.00 logp2 -1.58 logp2 -1.00
Entropy 0.81 Entropy 0.92 Entropy 1.00
StudentYes No
p1 0.86 p1 0.43p2 0.14 p2 0.57logp1 -0.22 logp1 -1.22logp2 -2.81 logp2 -0.81
Entropy 0.59 Entropy 0.99
Credit ratingFair Excellent
p1 0.75 p1 0.50p2 0.25 p2 0.50logp1 -0.42 logp1 -1.00logp2 -2.00 logp2 -1.00
Entropy 0.81 Entropy 1.00
Starting with age as node we consider all other variablesIncome
Low Medium Highp1 1 p1 0.5 p1 0p2 0 p2 0.5 p2 1logp1 0 logp1 -1 logp1 0logp2 0 logp2 -1 logp2 0Entropy 0 Entropy 1 Entropy 0
StudentNO Yes
p1 0 p1 1p2 1 p2 0logp1 0 logp1 0logp2 0 logp2 0Entropy 0 Entropy 0
Credit Ratingfair Excellent
p1 0.333333333 p1 0.5p2 0.666666667 p2 0.5logp1 -1.5849625 logp1 -1logp2 -0.5849625 logp2 -1Entropy 0.918295834 Entropy 0
We take student here since entropy is least. Moreover we finish the tree there since end entropy is 0
When age is high
IncomeLow Medium High
p1 0.5 p1 0.666667 p1 0p2 0.5 p2 0.333333 p2 0logp1 -1 logp1 -0.584963 logp1 0logp2 -1 logp2 -1.584963 logp2 0Entropy 1 Entropy 0.918296 Entropy 0
StudentNO Yes
p1 0.5 p1 0.666667p2 0.5 p2 0.333333logp1 -1 logp1 -0.584963logp2 -1 logp2 -1.584963Entropy 1 Entropy 0.138346
Credit Ratingfair Excellent
p1 1 p1 0p2 0 p2 1logp1 0 logp1 0logp2 0 logp2 0Entropy 0 Entropy 0
hence entropy of credit rating is least. We take this and since end entropies become 0, we terminate here.
AGE
<=30 31-40 >40
STUDENT YES CREDIT RATING
yes excellent fair
YES NO YES
P(Low) 0.36P(Medium) 0.29P(High) 0.36
Entropy (Age) 0.69
P(Low) 0.29P(Medium) 0.43P(High) 0.29
Entropy (Income) 0.91
P(Yes) 0.50P(No) 0.50
Entropy (Student) 0.79
P(Fair) 0.57P(Excellent) 0.43
Entropy (Credit Rating) 0.89
P(low) 0.2P(medium) 0.4P(high) 0.4
Entropy(income) 0.4
P(no) 0.6P(yes) 0.4
Entropy(studnt) 0
P(fair) 0.6P(excellent) 0.4
Entropy(credit rating) 0.550978
P(low) 0.40P(medium) 0.6P(high) 0
Entropy(income) 0.950978
P(no) 0.4P(yes) 0.6
Entropy(studnt) 0.483007
P(fair) 0.6P(excellent) 0.4
Entropy(credit rating) 0
hence entropy of credit rating is least. We take this and since end entropies become 0, we terminate here.