Decision Trees, MDL, Boostingguestrin/Class/10701-S05/slides/DTrees-MDL-Boo… · • Decision...

35
Decision Trees, MDL, Boosting Machine Learning 10-701 Tom M. Mitchell Carnegie Mellon University Recommended reading: • Decision trees: Mitchell Chapter 3 • MDL: Mitchell Chapter 6.6; Bishop Chapter 10.10; • Boosting: “The Boosting Approach to Machine Learning: An Overview,” R. Schapire, MSRI Workshop on Nonlinear Estimation and Classification, 2002. (see class website)

Transcript of Decision Trees, MDL, Boostingguestrin/Class/10701-S05/slides/DTrees-MDL-Boo… · • Decision...

Page 1: Decision Trees, MDL, Boostingguestrin/Class/10701-S05/slides/DTrees-MDL-Boo… · • Decision trees – ID3, C4.5 – Rule extraction from trees – Overfitting and tree/rule post-pruning

Decision Trees, MDL, Boosting

Machine Learning 10-701

Tom M. MitchellCarnegie Mellon University

Recommended reading:

• Decision trees: Mitchell Chapter 3

• MDL: Mitchell Chapter 6.6; Bishop Chapter 10.10;

• Boosting: “The Boosting Approach to Machine Learning: An Overview,”R. Schapire, MSRI Workshop on Nonlinear Estimation and Classification, 2002. (see class website)

mitchell
Underline
mitchell
Underline
mitchell
Underline
Page 2: Decision Trees, MDL, Boostingguestrin/Class/10701-S05/slides/DTrees-MDL-Boo… · • Decision trees – ID3, C4.5 – Rule extraction from trees – Overfitting and tree/rule post-pruning

Each internal node: test one attribute Xi

Each branch from a node: selects one value for Xi

Each leaf node: predict Y (or P(Y|X ∈ leaf))

How would you represent

AB ∨ CD(¬E)?

Page 3: Decision Trees, MDL, Boostingguestrin/Class/10701-S05/slides/DTrees-MDL-Boo… · • Decision trees – ID3, C4.5 – Rule extraction from trees – Overfitting and tree/rule post-pruning
Page 4: Decision Trees, MDL, Boostingguestrin/Class/10701-S05/slides/DTrees-MDL-Boo… · • Decision trees – ID3, C4.5 – Rule extraction from trees – Overfitting and tree/rule post-pruning

node = Root

[ID3, C4.5, …]

Page 5: Decision Trees, MDL, Boostingguestrin/Class/10701-S05/slides/DTrees-MDL-Boo… · • Decision trees – ID3, C4.5 – Rule extraction from trees – Overfitting and tree/rule post-pruning

EntropyEntropy H(X) of a random variable X

H(X) is the expected number of bits needed to encode a randomly drawn value of X (under most efficient code)

Why? Information theory:• Most efficient code assigns -log2P(X=i) bits to encode

the message X=i• So, expected number of bits is:

Page 6: Decision Trees, MDL, Boostingguestrin/Class/10701-S05/slides/DTrees-MDL-Boo… · • Decision trees – ID3, C4.5 – Rule extraction from trees – Overfitting and tree/rule post-pruning

Sample Entropy

Page 7: Decision Trees, MDL, Boostingguestrin/Class/10701-S05/slides/DTrees-MDL-Boo… · • Decision trees – ID3, C4.5 – Rule extraction from trees – Overfitting and tree/rule post-pruning
Page 8: Decision Trees, MDL, Boostingguestrin/Class/10701-S05/slides/DTrees-MDL-Boo… · • Decision trees – ID3, C4.5 – Rule extraction from trees – Overfitting and tree/rule post-pruning
Page 9: Decision Trees, MDL, Boostingguestrin/Class/10701-S05/slides/DTrees-MDL-Boo… · • Decision trees – ID3, C4.5 – Rule extraction from trees – Overfitting and tree/rule post-pruning
Page 10: Decision Trees, MDL, Boostingguestrin/Class/10701-S05/slides/DTrees-MDL-Boo… · • Decision trees – ID3, C4.5 – Rule extraction from trees – Overfitting and tree/rule post-pruning
Page 11: Decision Trees, MDL, Boostingguestrin/Class/10701-S05/slides/DTrees-MDL-Boo… · • Decision trees – ID3, C4.5 – Rule extraction from trees – Overfitting and tree/rule post-pruning
Page 12: Decision Trees, MDL, Boostingguestrin/Class/10701-S05/slides/DTrees-MDL-Boo… · • Decision trees – ID3, C4.5 – Rule extraction from trees – Overfitting and tree/rule post-pruning
Page 13: Decision Trees, MDL, Boostingguestrin/Class/10701-S05/slides/DTrees-MDL-Boo… · • Decision trees – ID3, C4.5 – Rule extraction from trees – Overfitting and tree/rule post-pruning
Page 14: Decision Trees, MDL, Boostingguestrin/Class/10701-S05/slides/DTrees-MDL-Boo… · • Decision trees – ID3, C4.5 – Rule extraction from trees – Overfitting and tree/rule post-pruning
Page 15: Decision Trees, MDL, Boostingguestrin/Class/10701-S05/slides/DTrees-MDL-Boo… · • Decision trees – ID3, C4.5 – Rule extraction from trees – Overfitting and tree/rule post-pruning
Page 16: Decision Trees, MDL, Boostingguestrin/Class/10701-S05/slides/DTrees-MDL-Boo… · • Decision trees – ID3, C4.5 – Rule extraction from trees – Overfitting and tree/rule post-pruning

Assume X values known,

labels Y encoded

Page 17: Decision Trees, MDL, Boostingguestrin/Class/10701-S05/slides/DTrees-MDL-Boo… · • Decision trees – ID3, C4.5 – Rule extraction from trees – Overfitting and tree/rule post-pruning
Page 18: Decision Trees, MDL, Boostingguestrin/Class/10701-S05/slides/DTrees-MDL-Boo… · • Decision trees – ID3, C4.5 – Rule extraction from trees – Overfitting and tree/rule post-pruning
Page 19: Decision Trees, MDL, Boostingguestrin/Class/10701-S05/slides/DTrees-MDL-Boo… · • Decision trees – ID3, C4.5 – Rule extraction from trees – Overfitting and tree/rule post-pruning
Page 20: Decision Trees, MDL, Boostingguestrin/Class/10701-S05/slides/DTrees-MDL-Boo… · • Decision trees – ID3, C4.5 – Rule extraction from trees – Overfitting and tree/rule post-pruning
Page 21: Decision Trees, MDL, Boostingguestrin/Class/10701-S05/slides/DTrees-MDL-Boo… · • Decision trees – ID3, C4.5 – Rule extraction from trees – Overfitting and tree/rule post-pruning
Page 22: Decision Trees, MDL, Boostingguestrin/Class/10701-S05/slides/DTrees-MDL-Boo… · • Decision trees – ID3, C4.5 – Rule extraction from trees – Overfitting and tree/rule post-pruning
Page 23: Decision Trees, MDL, Boostingguestrin/Class/10701-S05/slides/DTrees-MDL-Boo… · • Decision trees – ID3, C4.5 – Rule extraction from trees – Overfitting and tree/rule post-pruning
Page 24: Decision Trees, MDL, Boostingguestrin/Class/10701-S05/slides/DTrees-MDL-Boo… · • Decision trees – ID3, C4.5 – Rule extraction from trees – Overfitting and tree/rule post-pruning
Page 25: Decision Trees, MDL, Boostingguestrin/Class/10701-S05/slides/DTrees-MDL-Boo… · • Decision trees – ID3, C4.5 – Rule extraction from trees – Overfitting and tree/rule post-pruning

Boosting• Idea: given a weak learner, run it multiple

times on (reweighted) training data, then let learned classifiers vote

• On each iteration, weight each training example by how incorrectly it was classified

• Practically useful• Theoretically interesting

[Schapire, 1989]

Page 26: Decision Trees, MDL, Boostingguestrin/Class/10701-S05/slides/DTrees-MDL-Boo… · • Decision trees – ID3, C4.5 – Rule extraction from trees – Overfitting and tree/rule post-pruning
Page 27: Decision Trees, MDL, Boostingguestrin/Class/10701-S05/slides/DTrees-MDL-Boo… · • Decision trees – ID3, C4.5 – Rule extraction from trees – Overfitting and tree/rule post-pruning
Page 28: Decision Trees, MDL, Boostingguestrin/Class/10701-S05/slides/DTrees-MDL-Boo… · • Decision trees – ID3, C4.5 – Rule extraction from trees – Overfitting and tree/rule post-pruning

Training error of final classifier is bounded by:

Where

We can minimize this bound by choosing αt and ht on each iteration to minimize Zt.

What αt to choose for hypothesis ht?

Page 29: Decision Trees, MDL, Boostingguestrin/Class/10701-S05/slides/DTrees-MDL-Boo… · • Decision trees – ID3, C4.5 – Rule extraction from trees – Overfitting and tree/rule post-pruning

What αt to choose for hypothesis ht?We can minimize this bound by choosing αt on each

iteration to minimize Zt.

For boolean target function, this is accomplished by:

Page 30: Decision Trees, MDL, Boostingguestrin/Class/10701-S05/slides/DTrees-MDL-Boo… · • Decision trees – ID3, C4.5 – Rule extraction from trees – Overfitting and tree/rule post-pruning

Boosting: Experimental Results

Comparison of C4.5, Boosting C4.5, Boosting decision stumps (depth 1 trees), 27 benchmark datasets

[Freund & Schapire, 1996]

Page 31: Decision Trees, MDL, Boostingguestrin/Class/10701-S05/slides/DTrees-MDL-Boo… · • Decision trees – ID3, C4.5 – Rule extraction from trees – Overfitting and tree/rule post-pruning
Page 32: Decision Trees, MDL, Boostingguestrin/Class/10701-S05/slides/DTrees-MDL-Boo… · • Decision trees – ID3, C4.5 – Rule extraction from trees – Overfitting and tree/rule post-pruning

Boosting and Logistic Regression

Logistic regression assumes:

And tries to maximize data likelihood:

Equivalent to minimizing log loss

Page 33: Decision Trees, MDL, Boostingguestrin/Class/10701-S05/slides/DTrees-MDL-Boo… · • Decision trees – ID3, C4.5 – Rule extraction from trees – Overfitting and tree/rule post-pruning

Boosting and Logistic Regression

Logistic regression assumes:

And tries to maximize conditional data likelihood:

Equivalent to minimizing log loss

Boosting minimizes similar loss function!!

Page 34: Decision Trees, MDL, Boostingguestrin/Class/10701-S05/slides/DTrees-MDL-Boo… · • Decision trees – ID3, C4.5 – Rule extraction from trees – Overfitting and tree/rule post-pruning

Logistic regression and Boosting• Minimize loss fn

• Define

where xj predefined

• Minimize loss fn

• Define

where h(xi) defined dynamically to fit data

• Weights αj learned incrementally

Page 35: Decision Trees, MDL, Boostingguestrin/Class/10701-S05/slides/DTrees-MDL-Boo… · • Decision trees – ID3, C4.5 – Rule extraction from trees – Overfitting and tree/rule post-pruning

What you should know:

• Decision trees– ID3, C4.5– Rule extraction from trees– Overfitting and tree/rule post-pruning– Extensions…

• Minimum description length approach– And it’s Bayesian interpretation

• Boosting– Practical approach to improving accuracy– Exponential loss function; – relationship to logistic regression