in TMVA Cross Validation Improvements · Name : Mohammad Uzair Country : Pakistan Studies : Final...
Transcript of in TMVA Cross Validation Improvements · Name : Mohammad Uzair Country : Pakistan Studies : Final...
![Page 1: in TMVA Cross Validation Improvements · Name : Mohammad Uzair Country : Pakistan Studies : Final Year of bachelor’s degree in Computer Science University : National University](https://reader033.fdocuments.net/reader033/viewer/2022053012/5f0fdb167e708231d4463913/html5/thumbnails/1.jpg)
Cross Validation Improvements in TMVAMohammad Uzair
Supervisors:Dr. Lorenzo MONETAMr. Hans Kim ALBERTSSON BRANN
CERN SUMMER STUDENTFINAL PRESENTATION
![Page 2: in TMVA Cross Validation Improvements · Name : Mohammad Uzair Country : Pakistan Studies : Final Year of bachelor’s degree in Computer Science University : National University](https://reader033.fdocuments.net/reader033/viewer/2022053012/5f0fdb167e708231d4463913/html5/thumbnails/2.jpg)
About Me:● Name : Mohammad Uzair
● Country : Pakistan
● Studies : Final Year of bachelor’s degree
in Computer Science
● University : National University of
Sciences and Technology (NUST)
● Interested in : Artificial Intelligence and
Machine Learning in Computer Vision
2
![Page 3: in TMVA Cross Validation Improvements · Name : Mohammad Uzair Country : Pakistan Studies : Final Year of bachelor’s degree in Computer Science University : National University](https://reader033.fdocuments.net/reader033/viewer/2022053012/5f0fdb167e708231d4463913/html5/thumbnails/3.jpg)
Tasks Assigned: Improve Cross Validation in TMVA
● New tutorial introducing Cross Validation in TMVA and make it
available through SWAN.
● Improve presentation of tutorial by extending the plotting
functionality (Introducing the feature to draw an average ROC
Curve).
● Improve CV fold generation targeting unbalanced datasets
(Introducing the feature of Stratified Splitting)3
![Page 4: in TMVA Cross Validation Improvements · Name : Mohammad Uzair Country : Pakistan Studies : Final Year of bachelor’s degree in Computer Science University : National University](https://reader033.fdocuments.net/reader033/viewer/2022053012/5f0fdb167e708231d4463913/html5/thumbnails/4.jpg)
Validation in Machine Learning
● Divide the dataset into Train and Test sets.
● Tries to estimate the expected error of the model.
● Allows us to get an honest assessment of the
trained model.
4
![Page 5: in TMVA Cross Validation Improvements · Name : Mohammad Uzair Country : Pakistan Studies : Final Year of bachelor’s degree in Computer Science University : National University](https://reader033.fdocuments.net/reader033/viewer/2022053012/5f0fdb167e708231d4463913/html5/thumbnails/5.jpg)
Validation in Machine Learning
02
03
Dataset (N=20)
Training Set (70 %)
Test Set (30 %)
5
![Page 6: in TMVA Cross Validation Improvements · Name : Mohammad Uzair Country : Pakistan Studies : Final Year of bachelor’s degree in Computer Science University : National University](https://reader033.fdocuments.net/reader033/viewer/2022053012/5f0fdb167e708231d4463913/html5/thumbnails/6.jpg)
Drawbacks of Validation
● Test errors can be highly variable depending on
how much data we use for each set.
● Model Developed on only a subset of data. Ideally
we want 100% data for training and 100% for
validation.
6
![Page 7: in TMVA Cross Validation Improvements · Name : Mohammad Uzair Country : Pakistan Studies : Final Year of bachelor’s degree in Computer Science University : National University](https://reader033.fdocuments.net/reader033/viewer/2022053012/5f0fdb167e708231d4463913/html5/thumbnails/7.jpg)
K-Fold Cross Validation
Dataset (N=20)
Training Set
Validation Fold 1
7
![Page 8: in TMVA Cross Validation Improvements · Name : Mohammad Uzair Country : Pakistan Studies : Final Year of bachelor’s degree in Computer Science University : National University](https://reader033.fdocuments.net/reader033/viewer/2022053012/5f0fdb167e708231d4463913/html5/thumbnails/8.jpg)
K-Fold Cross Validation
02
03
Dataset (N=20)
Training Set
Validation Fold 2
8
![Page 9: in TMVA Cross Validation Improvements · Name : Mohammad Uzair Country : Pakistan Studies : Final Year of bachelor’s degree in Computer Science University : National University](https://reader033.fdocuments.net/reader033/viewer/2022053012/5f0fdb167e708231d4463913/html5/thumbnails/9.jpg)
K-Fold Cross Validation
Dataset (N=20)
Training Set
Validation Fold 3
9
And so on….
![Page 10: in TMVA Cross Validation Improvements · Name : Mohammad Uzair Country : Pakistan Studies : Final Year of bachelor’s degree in Computer Science University : National University](https://reader033.fdocuments.net/reader033/viewer/2022053012/5f0fdb167e708231d4463913/html5/thumbnails/10.jpg)
Tutorial Introducing TMVA Cross Validation
10
![Page 11: in TMVA Cross Validation Improvements · Name : Mohammad Uzair Country : Pakistan Studies : Final Year of bachelor’s degree in Computer Science University : National University](https://reader033.fdocuments.net/reader033/viewer/2022053012/5f0fdb167e708231d4463913/html5/thumbnails/11.jpg)
Tutorial introducing TMVA Cross Validation
11
● The tutorial was outdated.
● Was not easy to understand for new users.
● A new and updated tutorial in python jupyter notebook
with proper explanation so that it is easy to use and
understand and make it available through SWAN.
![Page 12: in TMVA Cross Validation Improvements · Name : Mohammad Uzair Country : Pakistan Studies : Final Year of bachelor’s degree in Computer Science University : National University](https://reader033.fdocuments.net/reader033/viewer/2022053012/5f0fdb167e708231d4463913/html5/thumbnails/12.jpg)
Jupyter Notebook for basic Tutorial onTMVA Cross Validation
02
03
12
![Page 13: in TMVA Cross Validation Improvements · Name : Mohammad Uzair Country : Pakistan Studies : Final Year of bachelor’s degree in Computer Science University : National University](https://reader033.fdocuments.net/reader033/viewer/2022053012/5f0fdb167e708231d4463913/html5/thumbnails/13.jpg)
Jupyter Notebook for basic Tutorial onTMVA Cross Validation
02
03
13
![Page 14: in TMVA Cross Validation Improvements · Name : Mohammad Uzair Country : Pakistan Studies : Final Year of bachelor’s degree in Computer Science University : National University](https://reader033.fdocuments.net/reader033/viewer/2022053012/5f0fdb167e708231d4463913/html5/thumbnails/14.jpg)
Extending the plotting functionality
14
![Page 15: in TMVA Cross Validation Improvements · Name : Mohammad Uzair Country : Pakistan Studies : Final Year of bachelor’s degree in Computer Science University : National University](https://reader033.fdocuments.net/reader033/viewer/2022053012/5f0fdb167e708231d4463913/html5/thumbnails/15.jpg)
Improve the presentation of tutorial
15
● Currently, we can only show ROC’s of individual folds.
● Often, we can be interested in the average behaviour.
● An added feature of visualising the average ROC Curve.
● Addition of this feature in the tutorial to improve the
presentation.
![Page 16: in TMVA Cross Validation Improvements · Name : Mohammad Uzair Country : Pakistan Studies : Final Year of bachelor’s degree in Computer Science University : National University](https://reader033.fdocuments.net/reader033/viewer/2022053012/5f0fdb167e708231d4463913/html5/thumbnails/16.jpg)
Average ROC Curve
02
03
16
With drawFolds = False With drawFolds = True
![Page 17: in TMVA Cross Validation Improvements · Name : Mohammad Uzair Country : Pakistan Studies : Final Year of bachelor’s degree in Computer Science University : National University](https://reader033.fdocuments.net/reader033/viewer/2022053012/5f0fdb167e708231d4463913/html5/thumbnails/17.jpg)
Improve CV fold generation
17
![Page 18: in TMVA Cross Validation Improvements · Name : Mohammad Uzair Country : Pakistan Studies : Final Year of bachelor’s degree in Computer Science University : National University](https://reader033.fdocuments.net/reader033/viewer/2022053012/5f0fdb167e708231d4463913/html5/thumbnails/18.jpg)
Random Splitting VS Stratified Splitting
18
● Determines the distribution of input data.
● Random Splitting just randomly splits the data equally.
● Data of a fold can be distributed differently than the whole.
● Problems, in particular, arise with unbalanced classes
(can more easily occur in multi-class classification).
● Stratified Splitting ensures that each fold follows the same
distribution as the whole.
![Page 19: in TMVA Cross Validation Improvements · Name : Mohammad Uzair Country : Pakistan Studies : Final Year of bachelor’s degree in Computer Science University : National University](https://reader033.fdocuments.net/reader033/viewer/2022053012/5f0fdb167e708231d4463913/html5/thumbnails/19.jpg)
Random Splitting VS Stratified Splitting
19
● Random Splitting just randomly splits the data equally30 samples03 Classes 10 samples 15 samples 05 samples
Fold 1 Fold 2 Fold 3 Fold 4 Fold 5
6 random samples in each fold
![Page 20: in TMVA Cross Validation Improvements · Name : Mohammad Uzair Country : Pakistan Studies : Final Year of bachelor’s degree in Computer Science University : National University](https://reader033.fdocuments.net/reader033/viewer/2022053012/5f0fdb167e708231d4463913/html5/thumbnails/20.jpg)
Random Splitting VS Stratified Splitting
20
● Stratified Splitting randomly splits data ensuring that each fold is good representative of the whole.
30 samples03 Classes 10 samples 15 samples 05 samples
Fold 1 Fold 2 Fold 3 Fold 4 Fold 5
6 random samples in each fold
![Page 21: in TMVA Cross Validation Improvements · Name : Mohammad Uzair Country : Pakistan Studies : Final Year of bachelor’s degree in Computer Science University : National University](https://reader033.fdocuments.net/reader033/viewer/2022053012/5f0fdb167e708231d4463913/html5/thumbnails/21.jpg)
Stratified Splitting in TMVA
21
![Page 22: in TMVA Cross Validation Improvements · Name : Mohammad Uzair Country : Pakistan Studies : Final Year of bachelor’s degree in Computer Science University : National University](https://reader033.fdocuments.net/reader033/viewer/2022053012/5f0fdb167e708231d4463913/html5/thumbnails/22.jpg)
Future Improvements:
22
● Weighted kFolds Splitting
● Investigate feasibility of integrating TMVA GUI with
tutorials
![Page 23: in TMVA Cross Validation Improvements · Name : Mohammad Uzair Country : Pakistan Studies : Final Year of bachelor’s degree in Computer Science University : National University](https://reader033.fdocuments.net/reader033/viewer/2022053012/5f0fdb167e708231d4463913/html5/thumbnails/23.jpg)
Special Thanks to my Supervisors:Dr. Lorenzo MONETA
Mr. Hans Kim ALBERTSSON BRANN23