CSC2515, Fall 2019 Random Forests & University of Toronto ...rgrosse/courses/csc2515_2019/... ·...

23
Random Forests & XGBoost Fartash Faghri University of Toronto CSC2515, Fall 2019 1

Transcript of CSC2515, Fall 2019 Random Forests & University of Toronto ...rgrosse/courses/csc2515_2019/... ·...

Page 1: CSC2515, Fall 2019 Random Forests & University of Toronto ...rgrosse/courses/csc2515_2019/... · XGBoost Fartash Faghri University of Toronto CSC2515, Fall 2019 1. HW1 - Handles tabular

Random Forests &XGBoost

Fartash Faghri

University of Toronto

CSC2515, Fall 2019

1

Page 2: CSC2515, Fall 2019 Random Forests & University of Toronto ...rgrosse/courses/csc2515_2019/... · XGBoost Fartash Faghri University of Toronto CSC2515, Fall 2019 1. HW1 - Handles tabular

HW1

- Handles tabular data

- Features can be of any type (discrete, categorical, raw text, etc)

- Features can be of different types

- No need to “normalize” features

- Too many features? DTs can be efficient by looking at only a few.

- Easy to interpret

Decision Trees

2

Page 4: CSC2515, Fall 2019 Random Forests & University of Toronto ...rgrosse/courses/csc2515_2019/... · XGBoost Fartash Faghri University of Toronto CSC2515, Fall 2019 1. HW1 - Handles tabular

Random Forests

4

Average multiple decision trees

(XGBoost slides)

Page 6: CSC2515, Fall 2019 Random Forests & University of Toronto ...rgrosse/courses/csc2515_2019/... · XGBoost Fartash Faghri University of Toronto CSC2515, Fall 2019 1. HW1 - Handles tabular

One DT Overfits

6

Page 7: CSC2515, Fall 2019 Random Forests & University of Toronto ...rgrosse/courses/csc2515_2019/... · XGBoost Fartash Faghri University of Toronto CSC2515, Fall 2019 1. HW1 - Handles tabular

Averaging DTs in a Random Forest

7

Page 8: CSC2515, Fall 2019 Random Forests & University of Toronto ...rgrosse/courses/csc2515_2019/... · XGBoost Fartash Faghri University of Toronto CSC2515, Fall 2019 1. HW1 - Handles tabular

max_depth : The maximum depth of the tree.

8

Page 9: CSC2515, Fall 2019 Random Forests & University of Toronto ...rgrosse/courses/csc2515_2019/... · XGBoost Fartash Faghri University of Toronto CSC2515, Fall 2019 1. HW1 - Handles tabular

9

Page 10: CSC2515, Fall 2019 Random Forests & University of Toronto ...rgrosse/courses/csc2515_2019/... · XGBoost Fartash Faghri University of Toronto CSC2515, Fall 2019 1. HW1 - Handles tabular

Each DT is trained on:

- Random subset of training data (sklearn.ensemble.RandomForestClassifier)

- Random subset of features (sklearn.ensemble.RandomForestClassifier)

- Noisy thresholds (sklearn.ensemble.ExtraTreesClassifier)

*Parallel construction and prediction

Randomization methods

10

Page 12: CSC2515, Fall 2019 Random Forests & University of Toronto ...rgrosse/courses/csc2515_2019/... · XGBoost Fartash Faghri University of Toronto CSC2515, Fall 2019 1. HW1 - Handles tabular

12

Page 13: CSC2515, Fall 2019 Random Forests & University of Toronto ...rgrosse/courses/csc2515_2019/... · XGBoost Fartash Faghri University of Toronto CSC2515, Fall 2019 1. HW1 - Handles tabular

13

Page 14: CSC2515, Fall 2019 Random Forests & University of Toronto ...rgrosse/courses/csc2515_2019/... · XGBoost Fartash Faghri University of Toronto CSC2515, Fall 2019 1. HW1 - Handles tabular

One-hot encoding:

Data preparation

14

Page 15: CSC2515, Fall 2019 Random Forests & University of Toronto ...rgrosse/courses/csc2515_2019/... · XGBoost Fartash Faghri University of Toronto CSC2515, Fall 2019 1. HW1 - Handles tabular

Why should be prepare data?

Any other encoding?

We also use one-hot encoding in cross-entropy, remember why? (Tutorial 2)

Data preparation

15

Page 16: CSC2515, Fall 2019 Random Forests & University of Toronto ...rgrosse/courses/csc2515_2019/... · XGBoost Fartash Faghri University of Toronto CSC2515, Fall 2019 1. HW1 - Handles tabular

Feature importance is calculated as the decrease in node impurity weighted by

the probability of reaching that node.

Feature importance

16

Page 17: CSC2515, Fall 2019 Random Forests & University of Toronto ...rgrosse/courses/csc2515_2019/... · XGBoost Fartash Faghri University of Toronto CSC2515, Fall 2019 1. HW1 - Handles tabular

17

Page 18: CSC2515, Fall 2019 Random Forests & University of Toronto ...rgrosse/courses/csc2515_2019/... · XGBoost Fartash Faghri University of Toronto CSC2515, Fall 2019 1. HW1 - Handles tabular

18

Page 19: CSC2515, Fall 2019 Random Forests & University of Toronto ...rgrosse/courses/csc2515_2019/... · XGBoost Fartash Faghri University of Toronto CSC2515, Fall 2019 1. HW1 - Handles tabular

Lecture 4 (Slides 43-70)

- Additive models

- Exponential loss

Weak-learners can be any function, e.g. decision trees and decision stumps.

AdaBoost

19

Page 20: CSC2515, Fall 2019 Random Forests & University of Toronto ...rgrosse/courses/csc2515_2019/... · XGBoost Fartash Faghri University of Toronto CSC2515, Fall 2019 1. HW1 - Handles tabular

Additive model with loss L:

GB approximately solves this objective iteratively and greedily:

Elements of Statistical learning (Chapter 10.10)

Gradient Boosting

20

Page 21: CSC2515, Fall 2019 Random Forests & University of Toronto ...rgrosse/courses/csc2515_2019/... · XGBoost Fartash Faghri University of Toronto CSC2515, Fall 2019 1. HW1 - Handles tabular

Gradient Tree Boosting with Regularization

Parallelization construction on CPU cores

Distributed training on a cluster of machines (large models)

Out-of-Core computing (large datasets that do not fit in memory)

XGBoost (A Scalable Tree Boosting System)

21

Page 22: CSC2515, Fall 2019 Random Forests & University of Toronto ...rgrosse/courses/csc2515_2019/... · XGBoost Fartash Faghri University of Toronto CSC2515, Fall 2019 1. HW1 - Handles tabular

XGBoost (Getting Started)

22