CLASSIFICATION: DECISION TREES · Prediction: Classification or Numeric Prediction? •The...
Transcript of CLASSIFICATION: DECISION TREES · Prediction: Classification or Numeric Prediction? •The...
![Page 1: CLASSIFICATION: DECISION TREES · Prediction: Classification or Numeric Prediction? •The objective of prediction is to estimate the unknown value of a variable. •In education,](https://reader030.fdocuments.net/reader030/viewer/2022040412/5f03ce567e708231d40adcc7/html5/thumbnails/1.jpg)
CLASSIFICATION: DECISION TREES Gökhan Akçapınar ([email protected])
Seminar in Methodology and Statistics John Nerbonne, Çağrı Çöltekin
University of Groningen – May, 2012
![Page 2: CLASSIFICATION: DECISION TREES · Prediction: Classification or Numeric Prediction? •The objective of prediction is to estimate the unknown value of a variable. •In education,](https://reader030.fdocuments.net/reader030/viewer/2022040412/5f03ce567e708231d40adcc7/html5/thumbnails/2.jpg)
Outline
• Research question
• Background knowledge
• Data collection
• Classification with decision trees
• R example
![Page 3: CLASSIFICATION: DECISION TREES · Prediction: Classification or Numeric Prediction? •The objective of prediction is to estimate the unknown value of a variable. •In education,](https://reader030.fdocuments.net/reader030/viewer/2022040412/5f03ce567e708231d40adcc7/html5/thumbnails/3.jpg)
Research Problem
• Predict student performance based on their activity data on wiki environment.
![Page 4: CLASSIFICATION: DECISION TREES · Prediction: Classification or Numeric Prediction? •The objective of prediction is to estimate the unknown value of a variable. •In education,](https://reader030.fdocuments.net/reader030/viewer/2022040412/5f03ce567e708231d40adcc7/html5/thumbnails/4.jpg)
Wikis • «A wiki is a website whose users can add, modify, or delete its
content via a web browser.»
![Page 5: CLASSIFICATION: DECISION TREES · Prediction: Classification or Numeric Prediction? •The objective of prediction is to estimate the unknown value of a variable. •In education,](https://reader030.fdocuments.net/reader030/viewer/2022040412/5f03ce567e708231d40adcc7/html5/thumbnails/5.jpg)
Wiki software
• Wikis are typically powered by wiki software and are often created collaboratively by multiple users.
![Page 6: CLASSIFICATION: DECISION TREES · Prediction: Classification or Numeric Prediction? •The objective of prediction is to estimate the unknown value of a variable. •In education,](https://reader030.fdocuments.net/reader030/viewer/2022040412/5f03ce567e708231d40adcc7/html5/thumbnails/6.jpg)
Wiki in Education
• Wikis are using mostly in group work and collaboration.
• Students create content, knowledge production
![Page 7: CLASSIFICATION: DECISION TREES · Prediction: Classification or Numeric Prediction? •The objective of prediction is to estimate the unknown value of a variable. •In education,](https://reader030.fdocuments.net/reader030/viewer/2022040412/5f03ce567e708231d40adcc7/html5/thumbnails/7.jpg)
Assessment in Wiki?
• Assessment and to rate individual performance are the main problems in introducing wikis.
• If teachers cannot assess wiki work, we can not expect wiki to be adopted for education, despite the potential learning gains for students.
![Page 8: CLASSIFICATION: DECISION TREES · Prediction: Classification or Numeric Prediction? •The objective of prediction is to estimate the unknown value of a variable. •In education,](https://reader030.fdocuments.net/reader030/viewer/2022040412/5f03ce567e708231d40adcc7/html5/thumbnails/8.jpg)
Why assessment is difficult?
Sample wiki page
![Page 9: CLASSIFICATION: DECISION TREES · Prediction: Classification or Numeric Prediction? •The objective of prediction is to estimate the unknown value of a variable. •In education,](https://reader030.fdocuments.net/reader030/viewer/2022040412/5f03ce567e708231d40adcc7/html5/thumbnails/9.jpg)
History Logs / Revisions
![Page 10: CLASSIFICATION: DECISION TREES · Prediction: Classification or Numeric Prediction? •The objective of prediction is to estimate the unknown value of a variable. •In education,](https://reader030.fdocuments.net/reader030/viewer/2022040412/5f03ce567e708231d40adcc7/html5/thumbnails/10.jpg)
WikLog
![Page 11: CLASSIFICATION: DECISION TREES · Prediction: Classification or Numeric Prediction? •The objective of prediction is to estimate the unknown value of a variable. •In education,](https://reader030.fdocuments.net/reader030/viewer/2022040412/5f03ce567e708231d40adcc7/html5/thumbnails/11.jpg)
WikLog
![Page 12: CLASSIFICATION: DECISION TREES · Prediction: Classification or Numeric Prediction? •The objective of prediction is to estimate the unknown value of a variable. •In education,](https://reader030.fdocuments.net/reader030/viewer/2022040412/5f03ce567e708231d40adcc7/html5/thumbnails/12.jpg)
WikLog
![Page 13: CLASSIFICATION: DECISION TREES · Prediction: Classification or Numeric Prediction? •The objective of prediction is to estimate the unknown value of a variable. •In education,](https://reader030.fdocuments.net/reader030/viewer/2022040412/5f03ce567e708231d40adcc7/html5/thumbnails/13.jpg)
Metrics (Attributes)
• PageCount: The number of pages created by the user.
• EditCount: The number of edits conducted by the user.
• LinkCount: The number of links created by the user.
• WordCount: The number of words created by the user.
![Page 14: CLASSIFICATION: DECISION TREES · Prediction: Classification or Numeric Prediction? •The objective of prediction is to estimate the unknown value of a variable. •In education,](https://reader030.fdocuments.net/reader030/viewer/2022040412/5f03ce567e708231d40adcc7/html5/thumbnails/14.jpg)
Sample Data
ID PageCount EditCount LinkCount WordCount Final Grade
1 55,00 334,00 30,00 5251,00 B1
2 5,00 194,00 0,00 430,00 F
3 37,00 267,00 243,00 9494,00 A1
4 75,00 402,00 138,00 1635,00 A2
5 24,00 183,00 1,00 2,00 F
6 40,00 232,00 83,00 1872,00 C1
7 8,00 128,00 13,00 1622,00 F
8 28,00 283,00 29,00 1361,00 B2
9 27,00 99,00 10,00 432,00 D2
10 32,00 113,00 9,00 1001,00 F
![Page 15: CLASSIFICATION: DECISION TREES · Prediction: Classification or Numeric Prediction? •The objective of prediction is to estimate the unknown value of a variable. •In education,](https://reader030.fdocuments.net/reader030/viewer/2022040412/5f03ce567e708231d40adcc7/html5/thumbnails/15.jpg)
Class / Output Variable
ID Final Grade
1 B1
2 F
3 A1
4 A2
5 F
6 C1
7 F
8 B2
9 D2
10 F
ID Performance
1 High
2 Low
3 High
4 High
5 Low
6 Medium
7 Low
8 Medium
9 Low
10 Low
A1 A2 B1 B2 C1 C2 D1 D2 F2 F3
High
Medium
Low
![Page 16: CLASSIFICATION: DECISION TREES · Prediction: Classification or Numeric Prediction? •The objective of prediction is to estimate the unknown value of a variable. •In education,](https://reader030.fdocuments.net/reader030/viewer/2022040412/5f03ce567e708231d40adcc7/html5/thumbnails/16.jpg)
Research Problem
ID PageCount EditCount LinkCount WordCount Performance
1 55,00 334,00 30,00 5251,00 High
2 5,00 194,00 0,00 430,00 Low
3 37,00 267,00 243,00 9494,00 High
4 75,00 402,00 138,00 1635,00 High
5 24,00 183,00 1,00 2,00 Low
6 40,00 232,00 83,00 1872,00 Medium
7 8,00 128,00 13,00 1622,00 Low
8 28,00 283,00 29,00 1361,00 Medium
9 27,00 99,00 10,00 432,00 Low
10 32,00 113,00 9,00 1001,00 Low
![Page 17: CLASSIFICATION: DECISION TREES · Prediction: Classification or Numeric Prediction? •The objective of prediction is to estimate the unknown value of a variable. •In education,](https://reader030.fdocuments.net/reader030/viewer/2022040412/5f03ce567e708231d40adcc7/html5/thumbnails/17.jpg)
Research Problem
ID PageCount EditCount LinkCount WordCount Performance
1 55,00 334,00 30,00 5251,00 High
2 5,00 194,00 0,00 430,00 Low
3 37,00 267,00 243,00 9494,00 High
4 75,00 402,00 138,00 1635,00 High
5 24,00 183,00 1,00 2,00 Low
6 40,00 232,00 83,00 1872,00 Medium
7 8,00 128,00 13,00 1622,00 Low
8 28,00 283,00 29,00 1361,00 Medium
9 27,00 99,00 10,00 432,00 Low
10 32,00 113,00 9,00 1001,00 Low
ID PageCount EditCount LinkCount WordCount Performance
11 80,00 547,00 193,00 1269,00 ?
12 65,00 271,00 273,00 2132,00 ?
13 47,00 252,00 231,00 1213,00 ?
14 106,00 278,00 399,00 2675,00 ?
15 55,00 266,00 49,00 5713,00 ?
![Page 18: CLASSIFICATION: DECISION TREES · Prediction: Classification or Numeric Prediction? •The objective of prediction is to estimate the unknown value of a variable. •In education,](https://reader030.fdocuments.net/reader030/viewer/2022040412/5f03ce567e708231d40adcc7/html5/thumbnails/18.jpg)
Research Problem
ID PageCount EditCount LinkCount WordCount Performance
1 55,00 334,00 30,00 5251,00 High
2 5,00 194,00 0,00 430,00 Low
3 37,00 267,00 243,00 9494,00 High
4 75,00 402,00 138,00 1635,00 High
5 24,00 183,00 1,00 2,00 Low
6 40,00 232,00 83,00 1872,00 Medium
7 8,00 128,00 13,00 1622,00 Low
8 28,00 283,00 29,00 1361,00 Medium
9 27,00 99,00 10,00 432,00 Low
10 32,00 113,00 9,00 1001,00 Low
ID PageCount EditCount LinkCount WordCount Performance
11 80,00 547,00 193,00 1269,00 ?
12 65,00 271,00 273,00 2132,00 ?
13 47,00 252,00 231,00 1213,00 ?
14 106,00 278,00 399,00 2675,00 ?
15 55,00 266,00 49,00 5713,00 ?
![Page 19: CLASSIFICATION: DECISION TREES · Prediction: Classification or Numeric Prediction? •The objective of prediction is to estimate the unknown value of a variable. •In education,](https://reader030.fdocuments.net/reader030/viewer/2022040412/5f03ce567e708231d40adcc7/html5/thumbnails/19.jpg)
Prediction: Classification or Numeric Prediction?
• The objective of prediction is to estimate the unknown value of a variable.
• In education, the values can be knowledge, score, or mark. • This value can be numerical/continuous value (regression task) or
categorical/discrete value (classification task).
Classification
1, 0, 0, 1….
A, D, B, F….
Numeric Prediction 23, 56, 87, 5…
3, 3, 1, 2….
![Page 20: CLASSIFICATION: DECISION TREES · Prediction: Classification or Numeric Prediction? •The objective of prediction is to estimate the unknown value of a variable. •In education,](https://reader030.fdocuments.net/reader030/viewer/2022040412/5f03ce567e708231d40adcc7/html5/thumbnails/20.jpg)
Classification
• Classification is a procedure in which individual items are placed into groups based on quantitative information regarding one or more characteristics inherent in the items and based on a training set of previously labeled items.
![Page 21: CLASSIFICATION: DECISION TREES · Prediction: Classification or Numeric Prediction? •The objective of prediction is to estimate the unknown value of a variable. •In education,](https://reader030.fdocuments.net/reader030/viewer/2022040412/5f03ce567e708231d40adcc7/html5/thumbnails/21.jpg)
Classification—A Two-Step Process
Apply
Model
Induction
Deduction
Learn
Model
Model
Test Set
Tree
Induction
algorithm
Training Set
Decision Tree
![Page 22: CLASSIFICATION: DECISION TREES · Prediction: Classification or Numeric Prediction? •The objective of prediction is to estimate the unknown value of a variable. •In education,](https://reader030.fdocuments.net/reader030/viewer/2022040412/5f03ce567e708231d40adcc7/html5/thumbnails/22.jpg)
Classification—A Two-Step Process
Apply
Model
Induction
Deduction
Learn
Model
Model
Test Set
Tree
Induction
algorithm
Training Set
Decision Tree
Model Construction
![Page 23: CLASSIFICATION: DECISION TREES · Prediction: Classification or Numeric Prediction? •The objective of prediction is to estimate the unknown value of a variable. •In education,](https://reader030.fdocuments.net/reader030/viewer/2022040412/5f03ce567e708231d40adcc7/html5/thumbnails/23.jpg)
Apply
Model
Induction
Deduction
Learn
Model
Model
Test Set
Tree
Induction
algorithm
Training Set
Classification—A Two-Step Process
Decision Tree
Using the Model in Prediction
![Page 24: CLASSIFICATION: DECISION TREES · Prediction: Classification or Numeric Prediction? •The objective of prediction is to estimate the unknown value of a variable. •In education,](https://reader030.fdocuments.net/reader030/viewer/2022040412/5f03ce567e708231d40adcc7/html5/thumbnails/24.jpg)
Classification Techniques
• Decision Tree based Methods
• Rule-based Methods
• Memory based reasoning
• Neural Networks
• Naïve Bayes and Bayesian Belief Networks
• Support Vector Machines
![Page 25: CLASSIFICATION: DECISION TREES · Prediction: Classification or Numeric Prediction? •The objective of prediction is to estimate the unknown value of a variable. •In education,](https://reader030.fdocuments.net/reader030/viewer/2022040412/5f03ce567e708231d40adcc7/html5/thumbnails/25.jpg)
Classification Techniques
• Decision Tree based Methods
![Page 26: CLASSIFICATION: DECISION TREES · Prediction: Classification or Numeric Prediction? •The objective of prediction is to estimate the unknown value of a variable. •In education,](https://reader030.fdocuments.net/reader030/viewer/2022040412/5f03ce567e708231d40adcc7/html5/thumbnails/26.jpg)
Example of a Decision Tree
Edit
Word
Medium
Low
High
< 200
> 200
> 3000 < 3000
Splitting Attributes
Training Data Model: Decision Tree
ID PageCount EditCount LinkCount WordCount Performance
1 55,00 334,00 30,00 5251,00 High
2 5,00 194,00 0,00 430,00 Low
3 37,00 267,00 243,00 9494,00 High
4 75,00 402,00 138,00 1635,00 High
5 24,00 183,00 1,00 2,00 Low
6 40,00 232,00 83,00 1872,00 Medium
7 8,00 128,00 13,00 1622,00 Low
8 28,00 283,00 29,00 1361,00 Medium
9 27,00 99,00 10,00 432,00 Low
10 32,00 113,00 9,00 1001,00 Low
![Page 27: CLASSIFICATION: DECISION TREES · Prediction: Classification or Numeric Prediction? •The objective of prediction is to estimate the unknown value of a variable. •In education,](https://reader030.fdocuments.net/reader030/viewer/2022040412/5f03ce567e708231d40adcc7/html5/thumbnails/27.jpg)
Example of a Decision Tree
Page
Link
Medium
High
Low
> 55
< 55
< 20 > 20
Splitting Attributes
Training Data Model: Decision Tree
ID PageCount EditCount LinkCount WordCount Performance
1 55,00 334,00 30,00 5251,00 High
2 5,00 194,00 0,00 430,00 Low
3 37,00 267,00 243,00 9494,00 High
4 75,00 402,00 138,00 1635,00 High
5 24,00 183,00 1,00 2,00 Low
6 40,00 232,00 83,00 1872,00 Medium
7 8,00 128,00 13,00 1622,00 Low
8 28,00 283,00 29,00 1361,00 Medium
9 27,00 99,00 10,00 432,00 Low
10 32,00 113,00 9,00 1001,00 Low
![Page 28: CLASSIFICATION: DECISION TREES · Prediction: Classification or Numeric Prediction? •The objective of prediction is to estimate the unknown value of a variable. •In education,](https://reader030.fdocuments.net/reader030/viewer/2022040412/5f03ce567e708231d40adcc7/html5/thumbnails/28.jpg)
Example of Apply Model to Test Data
Decision Tree
Using the Model in Prediction
Apply
Model
Induction
Deduction
Learn
Model
Model
Test Set
Tree
Induction
algorithm
Training Set
![Page 29: CLASSIFICATION: DECISION TREES · Prediction: Classification or Numeric Prediction? •The objective of prediction is to estimate the unknown value of a variable. •In education,](https://reader030.fdocuments.net/reader030/viewer/2022040412/5f03ce567e708231d40adcc7/html5/thumbnails/29.jpg)
Apply Model to Test Data Test Data
Start from the root of tree.
Edit
Word
Medium
Low
High
< 200
> 200
> 3000 < 3000
ID PageCount EditCount LinkCount WordCount Performance
15 55,00 266,00 49,00 5713,00 ?
![Page 30: CLASSIFICATION: DECISION TREES · Prediction: Classification or Numeric Prediction? •The objective of prediction is to estimate the unknown value of a variable. •In education,](https://reader030.fdocuments.net/reader030/viewer/2022040412/5f03ce567e708231d40adcc7/html5/thumbnails/30.jpg)
ID PageCount EditCount LinkCount WordCount Performance
15 55,00 266,00 49,00 5713,00 ?
Apply Model to Test Data Test Data
Edit
Word
Medium
Low
High
< 200
> 200
> 3000 < 3000
![Page 31: CLASSIFICATION: DECISION TREES · Prediction: Classification or Numeric Prediction? •The objective of prediction is to estimate the unknown value of a variable. •In education,](https://reader030.fdocuments.net/reader030/viewer/2022040412/5f03ce567e708231d40adcc7/html5/thumbnails/31.jpg)
ID PageCount EditCount LinkCount WordCount Performance
15 55,00 266,00 49,00 5713,00 ?
Apply Model to Test Data Test Data
Edit
Word
Medium
Low
High
< 200
> 200
> 3000 < 3000
![Page 32: CLASSIFICATION: DECISION TREES · Prediction: Classification or Numeric Prediction? •The objective of prediction is to estimate the unknown value of a variable. •In education,](https://reader030.fdocuments.net/reader030/viewer/2022040412/5f03ce567e708231d40adcc7/html5/thumbnails/32.jpg)
ID PageCount EditCount LinkCount WordCount Performance
15 55,00 266,00 49,00 5713,00 ?
Apply Model to Test Data Test Data
Edit
Word
Medium
Low
High
< 200
> 200
> 3000 < 3000
![Page 33: CLASSIFICATION: DECISION TREES · Prediction: Classification or Numeric Prediction? •The objective of prediction is to estimate the unknown value of a variable. •In education,](https://reader030.fdocuments.net/reader030/viewer/2022040412/5f03ce567e708231d40adcc7/html5/thumbnails/33.jpg)
ID PageCount EditCount LinkCount WordCount Performance
15 55,00 266,00 49,00 5713,00 ?
Apply Model to Test Data Test Data
Edit
Word
Medium
Low
High
< 200
> 200
> 3000 < 3000
![Page 34: CLASSIFICATION: DECISION TREES · Prediction: Classification or Numeric Prediction? •The objective of prediction is to estimate the unknown value of a variable. •In education,](https://reader030.fdocuments.net/reader030/viewer/2022040412/5f03ce567e708231d40adcc7/html5/thumbnails/34.jpg)
ID PageCount EditCount LinkCount WordCount Performance
15 55,00 266,00 49,00 5713,00 ?
Apply Model to Test Data Test Data
Edit
Word
Medium
Low
High
< 200
> 200
> 3000 < 3000
![Page 35: CLASSIFICATION: DECISION TREES · Prediction: Classification or Numeric Prediction? •The objective of prediction is to estimate the unknown value of a variable. •In education,](https://reader030.fdocuments.net/reader030/viewer/2022040412/5f03ce567e708231d40adcc7/html5/thumbnails/35.jpg)
Choosing the Splitting Attribute
• Typical goodness functions: • information gain (ID3/C4.5)
• information gain ratio
• gini index
• Which is the best attribute? • The one which will result in the smallest tree
• Choose the attribute that produces the “purest” nodes
• Strategy: choose attribute that results in greatest information gain
![Page 36: CLASSIFICATION: DECISION TREES · Prediction: Classification or Numeric Prediction? •The objective of prediction is to estimate the unknown value of a variable. •In education,](https://reader030.fdocuments.net/reader030/viewer/2022040412/5f03ce567e708231d40adcc7/html5/thumbnails/36.jpg)
Information Gain (ID3/C4.5)
• Select the attribute with the highest information gain
• Expected information (entropy) needed to classify a tuple in D:
• Information needed (after using A to split D into v partitions) to classify D:
• Information gained by branching on attribute A
)(log)( 2
1
i
m
i
i ppDInfo
)(||
||)(
1
j
v
j
j
A DInfoD
DDInfo
(D)InfoInfo(D)Gain(A) A
![Page 37: CLASSIFICATION: DECISION TREES · Prediction: Classification or Numeric Prediction? •The objective of prediction is to estimate the unknown value of a variable. •In education,](https://reader030.fdocuments.net/reader030/viewer/2022040412/5f03ce567e708231d40adcc7/html5/thumbnails/37.jpg)
When do I play tennis? Outlook Temperature Humidity Windy Play?
sunny hot high false No
sunny hot high true No
overcast hot high false Yes
rain mild high false Yes
rain cool normal false Yes
rain cool normal true No
overcast cool normal true Yes
sunny mild high false No
sunny cool normal false Yes
rain mild normal false Yes
sunny mild normal true Yes
overcast mild high true Yes
overcast hot normal false Yes
rain mild high true No
![Page 38: CLASSIFICATION: DECISION TREES · Prediction: Classification or Numeric Prediction? •The objective of prediction is to estimate the unknown value of a variable. •In education,](https://reader030.fdocuments.net/reader030/viewer/2022040412/5f03ce567e708231d40adcc7/html5/thumbnails/38.jpg)
overcast
high normal false true
sunny rain
No No Yes Yes
Yes
Example Tree for “Play?”
Outlook
Humidity Windy
![Page 39: CLASSIFICATION: DECISION TREES · Prediction: Classification or Numeric Prediction? •The objective of prediction is to estimate the unknown value of a variable. •In education,](https://reader030.fdocuments.net/reader030/viewer/2022040412/5f03ce567e708231d40adcc7/html5/thumbnails/39.jpg)
Which attribute to select?
![Page 40: CLASSIFICATION: DECISION TREES · Prediction: Classification or Numeric Prediction? •The objective of prediction is to estimate the unknown value of a variable. •In education,](https://reader030.fdocuments.net/reader030/viewer/2022040412/5f03ce567e708231d40adcc7/html5/thumbnails/40.jpg)
Example: attribute “Outlook”, 2
• “Outlook” = “Sunny”:
• “Outlook” = “Overcast”:
• “Outlook” = “Rainy”:
• Expected information for attribute:
bits 971.0)5/3log(5/3)5/2log(5/25,3/5)entropy(2/)info([2,3]
bits 0)0log(0)1log(10)entropy(1,)info([4,0]
bits 971.0)5/2log(5/2)5/3log(5/35,2/5)entropy(3/)info([3,2]
971.0)14/5(0)14/4(971.0)14/5([3,2])[4,0],,info([3,2]
bits 693.0
![Page 41: CLASSIFICATION: DECISION TREES · Prediction: Classification or Numeric Prediction? •The objective of prediction is to estimate the unknown value of a variable. •In education,](https://reader030.fdocuments.net/reader030/viewer/2022040412/5f03ce567e708231d40adcc7/html5/thumbnails/41.jpg)
Computing the information gain • Information gain:
(information before split) – (information after split)
• Compute for attribute “Humidity”
0.693-0.940[3,2])[4,0],,info([2,3]-)info([9,5])Outlook"gain("
bits 247.0
![Page 42: CLASSIFICATION: DECISION TREES · Prediction: Classification or Numeric Prediction? •The objective of prediction is to estimate the unknown value of a variable. •In education,](https://reader030.fdocuments.net/reader030/viewer/2022040412/5f03ce567e708231d40adcc7/html5/thumbnails/42.jpg)
Example: attribute “Humidity”
• “Humidity” = “High”:
• “Humidity” = “Normal”:
• Expected information for attribute:
• Information Gain:
bits 985.0)7/4log(7/4)7/3log(7/37,4/7)entropy(3/)info([3,4]
bits 592.0)7/1log(7/1)7/6log(7/67,1/7)entropy(6/)info([6,1]
592.0)14/7(985.0)14/7([6,1]),info([3,4] bits 79.0
0.1520.788-0.940[6,1]),info([3,4]-)info([9,5]
![Page 43: CLASSIFICATION: DECISION TREES · Prediction: Classification or Numeric Prediction? •The objective of prediction is to estimate the unknown value of a variable. •In education,](https://reader030.fdocuments.net/reader030/viewer/2022040412/5f03ce567e708231d40adcc7/html5/thumbnails/43.jpg)
Computing the information gain • Information gain for attributes from weather data:
bits 247.0)Outlook"gain("
bits 029.0)e"Temperaturgain("
bits 152.0)Humidity"gain("
bits 048.0)Windy"gain("
overcast
sunny rain
Outlook
![Page 44: CLASSIFICATION: DECISION TREES · Prediction: Classification or Numeric Prediction? •The objective of prediction is to estimate the unknown value of a variable. •In education,](https://reader030.fdocuments.net/reader030/viewer/2022040412/5f03ce567e708231d40adcc7/html5/thumbnails/44.jpg)
Continuing to split
bits 571.0)e"Temperaturgain("
bits 971.0)Humidity"gain("
bits 020.0)Windy"gain("
![Page 45: CLASSIFICATION: DECISION TREES · Prediction: Classification or Numeric Prediction? •The objective of prediction is to estimate the unknown value of a variable. •In education,](https://reader030.fdocuments.net/reader030/viewer/2022040412/5f03ce567e708231d40adcc7/html5/thumbnails/45.jpg)
The final decision tree
Splitting stops when data can’t be split any further
![Page 46: CLASSIFICATION: DECISION TREES · Prediction: Classification or Numeric Prediction? •The objective of prediction is to estimate the unknown value of a variable. •In education,](https://reader030.fdocuments.net/reader030/viewer/2022040412/5f03ce567e708231d40adcc7/html5/thumbnails/46.jpg)
Rpart()
• install.packages('rpart')
• library(rpart)
• data = read.xls("C://tree_data.xls",colNames = TRUE)
• results = rpart(Performance~PageCount+EditCount+LinkCount+WordCount, data=data, method="class", parms=list(split='information'))
• printcp(results)
• plot(results)
• text(results)
![Page 47: CLASSIFICATION: DECISION TREES · Prediction: Classification or Numeric Prediction? •The objective of prediction is to estimate the unknown value of a variable. •In education,](https://reader030.fdocuments.net/reader030/viewer/2022040412/5f03ce567e708231d40adcc7/html5/thumbnails/47.jpg)
CLASSIFICATION: DECISION TREES Gökhan Akçapınar ([email protected])
Seminar in Methodology and Statistics John Nerbonne, Çağrı Çöltekin
University of Groningen – May, 2012