Data Mining Entropy

7
RID Age Income Student Credit_ratin Buys_computer 1 low high no fair no p1 2 low high no excellent no p2 3 medium high no fair yes logp1 4 high medium no fair yes logp2 5 high low yes fair yes Entropy 6 high low yes excellent no 7 medium low yes excellent yes 8 low medium no fair no 9 low low yes fair yes 10 high medium yes fair yes 11 low medium yes excellent yes 12 medium medium no excellent yes 13 medium high yes fair yes 14 high medium no excellent no

description

Entropy Assignment

Transcript of Data Mining Entropy

Page 1: Data Mining Entropy

RID Age Income Student Credit_rating Buys_computer Overall1 low high no fair no p1 0.642 low high no excellent no p2 0.363 medium high no fair yes logp1 -0.644 high medium no fair yes logp2 -1.495 high low yes fair yes Entropy 0.946 high low yes excellent no7 medium low yes excellent yes8 low medium no fair no9 low low yes fair yes

10 high medium yes fair yes11 low medium yes excellent yes12 medium medium no excellent yes13 medium high yes fair yes14 high medium no excellent no

AGE IS LOW

Page 2: Data Mining Entropy

The final tree is

Page 3: Data Mining Entropy

no

NO

Page 4: Data Mining Entropy

AgeLow Medium High

p1 0.40 p1 1.00 p1 0.60p2 0.60 p2 0.00 p2 0.40logp1 -1.32 logp1 0.00 logp1 -0.74logp2 -0.74 logp2 0.00 logp2 -1.32

Entropy 0.97 Entropy 0.00 Entropy 0.97

IncomeLow Medium High

p1 0.75 p1 0.67 p1 0.50p2 0.25 p2 0.33 p2 0.50logp1 -0.42 logp1 -0.58 logp1 -1.00logp2 -2.00 logp2 -1.58 logp2 -1.00

Entropy 0.81 Entropy 0.92 Entropy 1.00

StudentYes No

p1 0.86 p1 0.43p2 0.14 p2 0.57logp1 -0.22 logp1 -1.22logp2 -2.81 logp2 -0.81

Entropy 0.59 Entropy 0.99

Credit ratingFair Excellent

p1 0.75 p1 0.50p2 0.25 p2 0.50logp1 -0.42 logp1 -1.00logp2 -2.00 logp2 -1.00

Entropy 0.81 Entropy 1.00

Starting with age as node we consider all other variablesIncome

Low Medium Highp1 1 p1 0.5 p1 0p2 0 p2 0.5 p2 1logp1 0 logp1 -1 logp1 0logp2 0 logp2 -1 logp2 0Entropy 0 Entropy 1 Entropy 0

StudentNO Yes

p1 0 p1 1p2 1 p2 0logp1 0 logp1 0logp2 0 logp2 0Entropy 0 Entropy 0

Page 5: Data Mining Entropy

Credit Ratingfair Excellent

p1 0.333333333 p1 0.5p2 0.666666667 p2 0.5logp1 -1.5849625 logp1 -1logp2 -0.5849625 logp2 -1Entropy 0.918295834 Entropy 0

We take student here since entropy is least. Moreover we finish the tree there since end entropy is 0

When age is high

IncomeLow Medium High

p1 0.5 p1 0.666667 p1 0p2 0.5 p2 0.333333 p2 0logp1 -1 logp1 -0.584963 logp1 0logp2 -1 logp2 -1.584963 logp2 0Entropy 1 Entropy 0.918296 Entropy 0

StudentNO Yes

p1 0.5 p1 0.666667p2 0.5 p2 0.333333logp1 -1 logp1 -0.584963logp2 -1 logp2 -1.584963Entropy 1 Entropy 0.138346

Credit Ratingfair Excellent

p1 1 p1 0p2 0 p2 1logp1 0 logp1 0logp2 0 logp2 0Entropy 0 Entropy 0

hence entropy of credit rating is least. We take this and since end entropies become 0, we terminate here.

AGE

<=30 31-40 >40

STUDENT YES CREDIT RATING

Page 6: Data Mining Entropy

yes excellent fair

YES NO YES

Page 7: Data Mining Entropy

P(Low) 0.36P(Medium) 0.29P(High) 0.36

Entropy (Age) 0.69

P(Low) 0.29P(Medium) 0.43P(High) 0.29

Entropy (Income) 0.91

P(Yes) 0.50P(No) 0.50

Entropy (Student) 0.79

P(Fair) 0.57P(Excellent) 0.43

Entropy (Credit Rating) 0.89

P(low) 0.2P(medium) 0.4P(high) 0.4

Entropy(income) 0.4

P(no) 0.6P(yes) 0.4

Entropy(studnt) 0

Page 8: Data Mining Entropy

P(fair) 0.6P(excellent) 0.4

Entropy(credit rating) 0.550978

P(low) 0.40P(medium) 0.6P(high) 0

Entropy(income) 0.950978

P(no) 0.4P(yes) 0.6

Entropy(studnt) 0.483007

P(fair) 0.6P(excellent) 0.4

Entropy(credit rating) 0

hence entropy of credit rating is least. We take this and since end entropies become 0, we terminate here.