1 Short overview of Weka. Classifications Clusters Association rules Attribute selections...
-
Upload
brenton-dillon -
Category
Documents
-
view
233 -
download
2
Transcript of 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections...
![Page 1: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/1.jpg)
1
Short overview of Weka
![Page 2: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/2.jpg)
Classifications
Clusters
Association rules
Attribute selections
Visualisation
Weka: Explorer
![Page 3: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/3.jpg)
Weka: Memory issues
Windows Edit the RunWeka.ini file in the directory of installation
of Weka maxheap=128m -> maxheap=1280m
Linux Launch Weka using the command ($WEKAHOME is the
installation directory of Weka)Java -jar -Xmx1280m $WEKAHOME/weka.jar
3
![Page 4: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/4.jpg)
4
ISIDA ModelAnalyser
Features:
• Imports output files of general data mining programs, e.g. Weka
• Visualizes chemical structures
• Computes statistics for classification models
• Builds consensus models by combining different individual models
![Page 5: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/5.jpg)
Foreword For time reason:
Not all exercises will be performed during the session They will not be entirely presented neither
Numbering of the exercises refer to their numbering into the textbook.
5
![Page 6: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/6.jpg)
6
Ensemble LearningIgor Baskin, Gilles Marcou and Alexandre Varnek
![Page 7: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/7.jpg)
Hunting season …
Single hunter
Courtesy of Dr D. Fourches
![Page 8: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/8.jpg)
Hunting season …
Many hunters
![Page 9: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/9.jpg)
1 3 5 7 9 11 13 15 17 190%
5%
10%
15%
20%
25%
30%
35%
40%
45%
μ=0.4μ=0.3μ=0.2μ=0.1
What is the probability that a wrong decision will be taken by majority voting?
Probability of wrong decision (μ < 0.5) Each voter acts independently
9
More voters – less chances to take a wrong decision !
![Page 10: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/10.jpg)
The Goal of Ensemble Learning
Combine base-level models which are diverse in their decisions, and complementary each other
10
• Compounds
• Descriptors
• Machine Learning Methods
- Bagging and Boosting
- Random Subspace
- Stacking
Different possibilities to generate ensemble of models on one same initial data set
![Page 11: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/11.jpg)
Principle of Ensemble Learning
11
Training set
Matrix 1
Matrix 2
Matrix 3
Learningalgorithm
Model M1
Learningalgorithm
ModelM2
Learningalgorithm
ModelMe
ENSEMBLE
Consensus Model
Perturbed sets
C1
Cn
D1 Dm
Compounds/DescriptorMatrix
![Page 12: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/12.jpg)
Ensembles Generation: Bagging
12
• Compounds
• Descriptors
• Machine Learning Methods
- Bagging and Boosting
- Random Subspace
- Stacking
![Page 13: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/13.jpg)
Bagging
Introduced by Breiman in 1996 Based on bootstraping with replacement Usefull for unstable algorithms (e.g. decision trees)
13
Leo Breiman(1928-2005)
Leo Breiman (1996). Bagging predictors. Machine Learning. 24(2):123-140.
Bagging = Bootstrap Aggregation
![Page 14: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/14.jpg)
Training set S
.
.
.
C1
C2
C3
C4
Cn
Bootstrap
.
.
.
C3
C2
C2
C4
C4
Sample Si from training set S
• All compounds have the same probability to be selected
• Each compound can be selected several times or even not selected at all (i.e. compounds are sampled randomly with replacement)
Efron, B., & Tibshirani, R. J. (1993). "An introduction to the bootstrap". New York: Chapman & Hall
14
Si
D1 Dm D1 Dm
![Page 15: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/15.jpg)
Bagging
15
Training set
.
.
.
C1
C2
C3
C4
Cn
Learningalgorithm
Model M1
Learningalgorithm
Model M2
Learningalgorithm
Model Me
ENSEMBLE
Consensus Model
S1
S2
Se
C4
C2
C8
C2
C1
C9
C7
C2
C2
C1
C4
C3
C4
C8
Voting (classification)
Averaging (regression)
Data withperturbed setsof compounds
C1
![Page 16: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/16.jpg)
Classification - Descriptors ISIDA descritpors:
Sequences Unlimited/Restricted Augmented Atoms
Nomenclature: txYYlluu.
• x: type of the fragmentation• YY: fragments content• l,u: minimum and maximum number of constituent atoms
16
Classification - Data Acetylcholine Esterase inhibitors
( 27 actives, 1000 inactives)
![Page 17: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/17.jpg)
Classification - Files train-ache.sdf/test-ache.sdf
Molecular files for training/test set train-ache-t3ABl2u3.arff/test-ache-t3ABl2u3.arff
descriptor and property values for the training/test set ache-t3ABl2u3.hdr
descriptors' identifiers AllSVM.txt
SVM predictions on the test set using multiple fragmentations
17
![Page 18: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/18.jpg)
Regression - Descriptors ISIDA descritpors:
Sequences Unlimited/Restricted Augmented Atoms
Nomenclature: txYYlluu.
• x: type of the fragmentation• YY: fragments content• l,u: minimum and maximum number of constituent atoms
18
Regression - Data Log of solubility
( 818 in the training set, 817 in the test set)
![Page 19: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/19.jpg)
Regression - Files train-logs.sdf/test-logs.sdf
Molecular files for training/test set train-logs-t1ABl2u4.arff/test-logs-t1ABl2u4.arff
descriptor and property values for the training/test set logs-t1ABl2u4.hdr
descriptors' identifiers AllSVM.txt
SVM prodictions on the test set using multiple fragmentations
19
![Page 20: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/20.jpg)
Exercise 1
20
Development of one individual rules-based model (JRip method in WEKA)
![Page 21: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/21.jpg)
Exercise 1
21
Load train-ache-t3ABl2u3.arff
![Page 22: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/22.jpg)
Exercise 1
22
Load test-ache-t3ABl2u3.arff
![Page 23: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/23.jpg)
Exercise 1
23
Setup one JRip model
![Page 24: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/24.jpg)
Exercise 1: rules interpretation
24
187. (C*C),(C*C*C),(C*C-C),(C*N),(C*N*C),(C-C),(C-C-C),xC*188. (C-N),(C-N-C),(C-N-C),(C-N-C),xC189. (C*C),(C*C),(C*C*C),(C*C*C),(C*C*N),xC
![Page 25: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/25.jpg)
Exercise 1: randomization
25
What happens if we randomize the data and rebuild a JRip model ?
![Page 26: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/26.jpg)
Exercise 1: surprizing result !
26
Changing the data ordering induces the rules changes
![Page 27: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/27.jpg)
Exercise 2a: Bagging
27
• Reinitialize the dataset• In the classifier tab, choose the meta
classifier Bagging
![Page 28: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/28.jpg)
Exercise 2a: Bagging
28
Set the base classifier as JRip
Build an ensemble of 1 model
![Page 29: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/29.jpg)
Exercise 2a: Bagging
Save the Result buffer as JRipBag1.out Re-build the bagging model using 3 and 8 iterations Save the corresponding Result buffers as
JRipBag3.out and JRipBag8.out Build models using from 1 to 10 iterations
29
![Page 30: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/30.jpg)
Bagging
30
ROC AUC of the consensus model as a function of the number of bagging iterations
Classification
AChE
0 1 2 3 4 5 6 7 8 9 100.68
0.7
0.72
0.74
0.76
0.78
0.8
0.82
0.84
0.86
0.88
Number of bagging iterations
RO
C
AU
C
![Page 31: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/31.jpg)
Bagging Of Regression Models
31
![Page 32: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/32.jpg)
Ensembles Generation: Boosting
32
• Compounds
• Descriptors
• Machine Learning Methods
- Bagging and Boosting
- Random Subspace
- Stacking
![Page 33: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/33.jpg)
BoostingBoosting works by training a set of classifiers sequentially by combining them for prediction, where each latter classifier focuses on the mistakes of the earlier classifiers.
Yoav Freund Robert Shapire Jerome Friedman
Yoav Freund, Robert E. Schapire: Experiments with a new boosting algorithm. In: Thirteenth International Conference on Machine Learning, San Francisco, 148-156, 1996.
J.H. Friedman (1999). Stochastic Gradient Boosting. Computational Statistics and Data Analysis. 38:367-378.
AdaBoost - classification
Regression boosting
33
![Page 34: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/34.jpg)
Boosting for Classification. AdaBoost
34
Training set
.
.
.
C1
C2
C3
C4
Cn
Learningalgorithm
Model M1
Learningalgorithm
Model M2
Learningalgorithm
Model Mb
ENSEMBLE
Consensus Model
S1
S2
Se
C1
C2
C3
C4
Cn
.
.
.
w
w
w
w
w
e
ee
e
e
e
ee
e
e
C1
C2
C3
C4
Cn
.
.
.
w
ww
w
w
Weighted averaging & thresholding
w
C4
Cn
.
.
.
w
ww
w
C1
C2
C3
![Page 35: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/35.jpg)
Developing Classification Model
35
Load train-ache-t3ABl2u3.arff
In classification tab, load test-ache-t3ABl2u3.arff
![Page 36: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/36.jpg)
Exercise 2b: Boosting
36
In the classifier tab, choose the meta classifier AdaBoostM1Setup an ensemble of one JRip model
![Page 37: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/37.jpg)
Exercise 2b: Boosting
37
Save the Result buffer as JRipBoost1.out Re-build the boosting model using 3 and 8 iterations Save the corresponding Result buffers as
JRipBoost3.out and JRipBoost8.out Build models using from 1 to 10 iterations
![Page 38: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/38.jpg)
Boosting for Classification. AdaBoost
38
ROC AUC as a function of the number of
boosting iterations
Classification
AChE
Log(Number of boosting iterations)
RO
C
AU
C
0 1 2 3 4 5 6 7 8 9 100.74
0.75
0.76
0.77
0.78
0.79
0.8
0.81
0.82
0.83
![Page 39: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/39.jpg)
Bagging vs Boosting
39
1 10 100 10000.700000000000001
0.750000000000001
0.800000000000001
0.850000000000001
0.900000000000001
0.950000000000001
1
BaggingBoosting
Base learner – DecisionStump
1 10 1000.700000000000001
0.750000000000001
0.800000000000001
0.850000000000001
0.900000000000001
0.950000000000001
1
Base learner – JRip
![Page 40: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/40.jpg)
Conjecture: Bagging vs Boosting
40
Bagging leverages unstable base learners that are weak because of overfitting (JRip, MLR)
Boosting leverages stable base learners that are weak because of underfitting (DecisionStump, SLR)
![Page 41: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/41.jpg)
Ensembles Generation: Random Subspace
41
• Compounds
• Descriptors
• Machine Learning Methods
- Bagging and Boosting
- Random Subspace
- Stacking
![Page 42: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/42.jpg)
Random Subspace Method
Introduced by Ho in 1998 Modification of the training data proceeds in the
attributes (descriptors) space Usefull for high dimensional data
Tin Kam Ho
Tin Kam Ho (1998). The Random Subspace Method for Constructing Decision Forests. IEEE Transactions on Pattern Analysis and Machine Intelligence. 20(8):832-844.
42
![Page 43: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/43.jpg)
Random Subspace Method: Random Descriptor Selection
• All descriptors have the same probability to be selected
• Each descriptor can be selected only once
• Only a certain part of descriptors are selected in each run
43
...D1 D2 D3 D4 Dm
D3 D2 Dm D4
C1
Cn
C1
Cn
Training set with initial pool of descriptors
Training set with randomly selected descriptors
![Page 44: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/44.jpg)
Random Subspace Method
44
Training set
Learningalgorithm
Model M1
Learningalgorithm
Model M2
Learningalgorithm
Model Me
ENSEMBLE
Consensus Model
S1
S2
Se
Voting (classification)
Averaging (regression)
Data sets with randomly selected
descriptors
D1 D2 D3 D4 Dm
D4 D2 D3
D1 D2 D3
D4 D2 D1
![Page 45: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/45.jpg)
Developing Regression Models
45
Load train-logs-t1ABl2u4.arff
In classification tab, load test-logs-t1ABl2u4.arff
![Page 46: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/46.jpg)
Exercise 7
46
Choose the meta method Random Sub-Space.
![Page 47: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/47.jpg)
Exercise 7
47
Base classifier: Multi-Linear Regression without descriptor selection
Build an ensemble of 1 model
… then build an ensemble of 10 models.
![Page 48: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/48.jpg)
Exercise 7
48
1 model
10 models
![Page 49: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/49.jpg)
Exercise 7
49
![Page 50: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/50.jpg)
Random Forest
Particular implementation of bagging where base level algorithm is a random tree
Leo Breiman(1928-2005)
Leo Breiman (2001). Random Forests. Machine Learning. 45(1):5-32.
50
Random Forest = Bagging + Random Subspace
![Page 51: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/51.jpg)
Ensembles Generation: Stacking
51
• Compounds
• Descriptors
• Machine Learning Methods
- Bagging and Boosting
- Random Subspace
- Stacking
![Page 52: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/52.jpg)
Stacking
52
David H. Wolpert
Wolpert, D., Stacked Generalization., Neural Networks, 5(2), pp. 241-259., 1992
Breiman, L., Stacked Regression, Machine Learning, 24, 1996
Introduced by Wolpert in 1992 Stacking combines base learners by means of a
separate meta-learning method using their predictions on held-out data obtained through cross-validation
Stacking can be applied to models obtained using different learning algorithms
![Page 53: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/53.jpg)
Stacking
53
Training set
Data set
S
Data set
S
Data set
S
Learningalgorithm
L1
Model M1
ModelM2
ModelMe
ENSEMBLE
Consensus Model
The same data set
Data set
S
C1
Cn
D1 Dm
Learningalgorithm
L2
Learningalgorithm
Le
Machine Learning Meta-Method
(e.g. MLR)
Different algorithms
![Page 54: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/54.jpg)
Exercise 9
54
Choose meta method Stacking
Click here
![Page 55: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/55.jpg)
Exercise 9
55
• Delete the classifier ZeroR• Add PLS classifier (default parameters)• Add Regression Tree M5P (default
parameters)• Add Multi-Linear Regression without
descriptor selection
![Page 56: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/56.jpg)
Exercise 9
56
Click hereSelect Multi-Linear Regression as meta-method
![Page 57: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/57.jpg)
Exercise 9
57
![Page 58: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/58.jpg)
Exercise 9
58
Rebuild the stacked model using:• kNN (default parameters)• Multi-Linear Regression without descriptor selection• PLS classifier (default parameters)• Regression Tree M5P
![Page 59: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/59.jpg)
Exercise 9
59
![Page 60: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/60.jpg)
Exercise 9 - Stacking
60
Regression models
for LogS
Learning algorithm
R (correlation coefficient)
RMSE
MLR 0.8910 1.0068
PLS 0.9171 0.8518
M5P (regression trees)
0.9176 0.8461
1-NN (one nearest
neighbour)
0.8455 1.1889
Stacking of MLR, PLS, M5P
0.9366 0.7460
Stacking of MLR, PLS, M5P, 1-NN
0.9392 0.7301
![Page 61: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/61.jpg)
Conclusion Ensemble modeling converts several weak
classifiers (Classification/Regression problems) into a strong one.
There exist several ways to generate individual models Compounds Descriptors Machine Learning Methods
61
![Page 62: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/62.jpg)
Thank you… and
Ducks and hunters, thanks to D. Fourches
62
Questions?
![Page 63: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/63.jpg)
Exercise 1
63
Development of one individual rules-based model for classification (Inhibition of AChE)
One individual rules-based model is very unstable: the rules change as a function of ordering the compounds in the dataset
![Page 64: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/64.jpg)
Ensemble modelling
Model 1
model 2
Model 3
Model 4
![Page 65: 1 Short overview of Weka. Classifications Clusters Association rules Attribute selections Visualisation Weka: Explorer.](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649c775503460f9492c291/html5/thumbnails/65.jpg)
Ensemble modelling
MLR
SVM NN
kNN