Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11...
Transcript of Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11...
![Page 2: Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,](https://reader033.fdocuments.net/reader033/viewer/2022051800/5ad654297f8b9aff228e2712/html5/thumbnails/2.jpg)
http://kt.ijs.si/petra_kralj/dmkd.html
Data Mining Tools
• Weka http://www.cs.waikato.ac.nz/ml/weka/
• Orange http://orange.biolab.si/
• Knime http://www.knime.org/
• Taverna http://www.taverna.org.uk/
• Rapid Miner http://rapid-i.com/content/view/181/196/
• ClowdFlows http://clowdflows.org/
![Page 3: Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,](https://reader033.fdocuments.net/reader033/viewer/2022051800/5ad654297f8b9aff228e2712/html5/thumbnails/3.jpg)
http://kt.ijs.si/petra_kralj/dmkd.html
Weka (Waikato Environment for Knowledge Analysis)
• Collection of machine learning algorithms for data mining tasks
• The algorithms
– Can be applied directly to a dataset
– Can be called from Java code (library)
• Weka contains tools for
– Data pre-processing
– Classification
– Regression
– Clustering
– Association rules
– Visualization
• Weka is open source software issued under the GNU General Public
Licanse
![Page 4: Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,](https://reader033.fdocuments.net/reader033/viewer/2022051800/5ad654297f8b9aff228e2712/html5/thumbnails/4.jpg)
http://kt.ijs.si/petra_kralj/dmkd.html
Exsercise1: ID3 in Weka
1. Build a decision tree with the ID3 algorithm on the lenses dataset,
evaluate on a separate test set
![Page 5: Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,](https://reader033.fdocuments.net/reader033/viewer/2022051800/5ad654297f8b9aff228e2712/html5/thumbnails/5.jpg)
http://kt.ijs.si/petra_kralj/dmkd.html
Weka: Install
Download
version
3.6
http://www.cs.waikato.ac.nz/ml/weka/
![Page 6: Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,](https://reader033.fdocuments.net/reader033/viewer/2022051800/5ad654297f8b9aff228e2712/html5/thumbnails/6.jpg)
http://kt.ijs.si/petra_kralj/dmkd.html
Weka: Run Explorer
Choose Explorer
![Page 7: Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,](https://reader033.fdocuments.net/reader033/viewer/2022051800/5ad654297f8b9aff228e2712/html5/thumbnails/7.jpg)
http://kt.ijs.si/petra_kralj/dmkd.html
Exercise 1: ID3 in Weka
• In the Weka data mining tool, induce a decision
tree for the lenses dataset with the ID3
algorithm.
• Data: – lensesTrain.arff
– lensesTest.arff
• Compare the outcome with the manually
obtained results.
![Page 8: Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,](https://reader033.fdocuments.net/reader033/viewer/2022051800/5ad654297f8b9aff228e2712/html5/thumbnails/8.jpg)
http://kt.ijs.si/petra_kralj/dmkd.html
Load the data
![Page 9: Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,](https://reader033.fdocuments.net/reader033/viewer/2022051800/5ad654297f8b9aff228e2712/html5/thumbnails/9.jpg)
http://kt.ijs.si/petra_kralj/dmkd.html
Load the data - 2
lensesTrain.arff
![Page 10: Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,](https://reader033.fdocuments.net/reader033/viewer/2022051800/5ad654297f8b9aff228e2712/html5/thumbnails/10.jpg)
http://kt.ijs.si/petra_kralj/dmkd.html
The data are loaded
Target variable
Choose
“Classify”
![Page 11: Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,](https://reader033.fdocuments.net/reader033/viewer/2022051800/5ad654297f8b9aff228e2712/html5/thumbnails/11.jpg)
http://kt.ijs.si/petra_kralj/dmkd.html
Choose algoritem
![Page 12: Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,](https://reader033.fdocuments.net/reader033/viewer/2022051800/5ad654297f8b9aff228e2712/html5/thumbnails/12.jpg)
http://kt.ijs.si/petra_kralj/dmkd.html
trees
Id3
![Page 13: Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,](https://reader033.fdocuments.net/reader033/viewer/2022051800/5ad654297f8b9aff228e2712/html5/thumbnails/13.jpg)
http://kt.ijs.si/petra_kralj/dmkd.html
1 2
3
5
lensesTest.arff
4
![Page 14: Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,](https://reader033.fdocuments.net/reader033/viewer/2022051800/5ad654297f8b9aff228e2712/html5/thumbnails/14.jpg)
http://kt.ijs.si/petra_kralj/dmkd.html
Decision tree
![Page 15: Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,](https://reader033.fdocuments.net/reader033/viewer/2022051800/5ad654297f8b9aff228e2712/html5/thumbnails/15.jpg)
http://kt.ijs.si/petra_kralj/dmkd.html
Classification accuracy
Confusion
matrix
![Page 16: Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,](https://reader033.fdocuments.net/reader033/viewer/2022051800/5ad654297f8b9aff228e2712/html5/thumbnails/16.jpg)
http://kt.ijs.si/petra_kralj/dmkd.html
Exercise 2: CAR dataset
• 1728 examples
• 6 attributes – 6 nominal
– 0 numeric
• Nominal target variable – 4 classes: unacc, acc, good, v-good
– Distribution of classes • unacc (70%), acc (22%), good (4%), v-good (4%)
• No missing values
![Page 17: Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,](https://reader033.fdocuments.net/reader033/viewer/2022051800/5ad654297f8b9aff228e2712/html5/thumbnails/17.jpg)
http://kt.ijs.si/petra_kralj/dmkd.html
Preparing the data for WEKA - 1
Data in a spreadsheet
(e.g. MS Excel)
- Rows are examples
- Columns are attributes
- The last column is the target
variable
![Page 18: Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,](https://reader033.fdocuments.net/reader033/viewer/2022051800/5ad654297f8b9aff228e2712/html5/thumbnails/18.jpg)
http://kt.ijs.si/petra_kralj/dmkd.html
Preparing the data for WEKA - 2
Save as “.csv” - Careful with dots “.”,
commas “,” and
semicolons “;”!
![Page 19: Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,](https://reader033.fdocuments.net/reader033/viewer/2022051800/5ad654297f8b9aff228e2712/html5/thumbnails/19.jpg)
http://kt.ijs.si/petra_kralj/dmkd.html
Load the data
Target variable
Car.csv
![Page 20: Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,](https://reader033.fdocuments.net/reader033/viewer/2022051800/5ad654297f8b9aff228e2712/html5/thumbnails/20.jpg)
http://kt.ijs.si/petra_kralj/dmkd.html
Choose algorithm J48
![Page 21: Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,](https://reader033.fdocuments.net/reader033/viewer/2022051800/5ad654297f8b9aff228e2712/html5/thumbnails/21.jpg)
http://kt.ijs.si/petra_kralj/dmkd.html
Building and evaluating the tree
![Page 22: Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,](https://reader033.fdocuments.net/reader033/viewer/2022051800/5ad654297f8b9aff228e2712/html5/thumbnails/22.jpg)
http://kt.ijs.si/petra_kralj/dmkd.html
Classified as
Actual values
Classification
accuracy
![Page 23: Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,](https://reader033.fdocuments.net/reader033/viewer/2022051800/5ad654297f8b9aff228e2712/html5/thumbnails/23.jpg)
http://kt.ijs.si/petra_kralj/dmkd.html
Right
mouse
click
![Page 24: Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,](https://reader033.fdocuments.net/reader033/viewer/2022051800/5ad654297f8b9aff228e2712/html5/thumbnails/24.jpg)
http://kt.ijs.si/petra_kralj/dmkd.html
Tree pruning
Set the minimal number
of objects per leaf to 15
Parameters of the
algorithm (right
mouse click)
![Page 25: Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,](https://reader033.fdocuments.net/reader033/viewer/2022051800/5ad654297f8b9aff228e2712/html5/thumbnails/25.jpg)
http://kt.ijs.si/petra_kralj/dmkd.html
Reduced
number of
leaves and
nodes Easier to interpret
Lower
classification
accuracy
![Page 26: Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,](https://reader033.fdocuments.net/reader033/viewer/2022051800/5ad654297f8b9aff228e2712/html5/thumbnails/26.jpg)
http://kt.ijs.si/petra_kralj/dmkd.html
![Page 27: Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,](https://reader033.fdocuments.net/reader033/viewer/2022051800/5ad654297f8b9aff228e2712/html5/thumbnails/27.jpg)
http://kt.ijs.si/petra_kralj/dmkd.html
Naïve Bayes classifier
![Page 28: Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,](https://reader033.fdocuments.net/reader033/viewer/2022051800/5ad654297f8b9aff228e2712/html5/thumbnails/28.jpg)
http://kt.ijs.si/petra_kralj/dmkd.html
![Page 29: Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,](https://reader033.fdocuments.net/reader033/viewer/2022051800/5ad654297f8b9aff228e2712/html5/thumbnails/29.jpg)
http://kt.ijs.si/petra_kralj/dmkd.html
![Page 30: Hand on Weka - IJSkt.ijs.si/PetraKralj/IPS_DM_1415/HandsOnWeka-Part1.pdf · Hand on Weka 2014/11/11 Petra Kralj Novak ... Exercise 1: ID3 in Weka • In the Weka data mining tool,](https://reader033.fdocuments.net/reader033/viewer/2022051800/5ad654297f8b9aff228e2712/html5/thumbnails/30.jpg)
http://kt.ijs.si/petra_kralj/dmkd.html
Summary
• Weka
• ID3, separate test set
• Data preparation
• J48 (C4.5), cross validation, tree prunning
• Naïve Bayes