DECISION TREES. Decision trees One possible representation for hypotheses.

DECISION TREES

Decision trees

One possible representation for hypotheses

Choosing an attribute

Idea: a good attribute splits the examples into subsets that are (ideally) "all positive" or "all negative"

Which is a better choice? Patrons

Using information theory

Implement Choose-Attribute in the DTL algorithm based on information content – measured by Entropy

Entropy is the measure of uncertainty of a random variable More uncertainty leads to higher entropy More knowledge leads to lower entropy

Entropy

For a training set containing p positive examples and n negative examples:

22 loglog),(

Entropy Examples

Fair coin flip:

Biased coin flip:

Information Gain

Measures Reduction in Entropy achieved because of the split.

Choose the split that achieves most reduction (maximizes Information Gain)

Disadvantage: Tends to prefer splits that result in large number of partitions, each being small but pure.

isplit iEntropy

npEntropyGAIN

Parent node is split into partitions. is number of records in partition .

Information Gain Example

Consider the attributes Patrons and Type:

Patrons has the highest Information Gain of all attributes and so is chosen by the DTL algorithm as the root

bits 0)]4

bits 0541.)]6

6)0,1(

4)1,0(

IIIITypeGain

IIIPatronsGain

Learned Restaurant Tree

Decision tree learned from the 12 examples:

Substantially simpler than the full tree Raining and Reservation were not necessary to

classify all the data.

Stopping Criteria

Stop expanding a node when all the records belong to the same class

Stop expanding a node when all the records have similar attribute values

Overfitting

Overfitting results in decision trees that are more complex than necessary

Training error does not provide a good estimate of how well the tree will perform on previously unseen records (need a test set)

How to Address Overfitting 1… Pruning

Grow decision tree to its entirety Trim the nodes of the decision tree in a

bottom-up fashion If generalization error is reduced after

trimming, replace sub-tree by a leaf node ( test, see page 706)

Class label of leaf node is determined from majority class of instances in the sub-tree

How to Address Overfitting 2…

Early Stopping Rule Stop the algorithm before it becomes a fully-

grown tree Stopping conditions:

Stop if number of instances is less than some user-specified threshold

Stop if class distribution of instances are independent of the available features (e.g., using test)

Stop if expanding the current node does not improve impurity measures (e.g., information gain).

How to Address Overfitting… Is the early stopping rule strictly better

than pruning (i.e., generating the full tree and then cutting it)?

Remaining Challenges…

Continuous values: Need to be split into discrete categories. Sort all values, then consider split points

between two examples in sorted order that have different classifications.

Missing values: Affect how an example is classified, information

gain calculations, test set error rate. Pretend that the example has all possible values

for the missing attribute, weight by its frequency among all the examples in the current node.

Summary

Advantages of decision trees: Inexpensive to construct Extremely fast at classifying unknown records Easy to interpret for small-sized trees Accuracy is comparable to other classification

techniques for many simple data sets

Learning performance = prediction accuracy measured on test set

K-NEAREST NEIGHBORS

K-Nearest Neighbors

What value do we assign to the green sample?

K-Nearest Neighbors

1-NN: For a given query point ,

assign the class of the nearest neighbour.

K-NN Compute the nearest

neighbours and assign the class by majority vote.

Decision Regions for 1-NN

Effect of

𝑘=1 𝑘=5

K-Nearest Neighbors

Euclidian Distance:

Weighted Euclidian Distance:

Where is the dimensionality of the data.

Weighting the Distance to Remove Irrelevant Features

+ + +o

Weighting the Distance to Remove Irrelevant Features

+ ++ ++ + + +oo o o oo ooooo oo o oo oo?

Nearest Neighbors Search

Let be a set of training points Given a query point , find the nearest

neighbor of in .

Naïve approach Compute the distance from the query

point to every other point in the database, keeping track of the "best so far".

Running time is O(n).

Data Structure approach Construct a data structure which makes

this search more efficient

Quadtree

Is a tree data structure in which each internal node has up to four children.

Every node in the Quadtree corresponds to a square.

If a node has children, then their corresponding squares are the four quadrants of the square of .

The leaves of a Quadtree form a Quadtree Subdivision of the square of the root.

The children of a node are labelled NE, NW, SW, and SE to indicate to which quadrant they correspond.

Quadtree Construction

Input: point set P

while Some cell C contains more than 1 point do

Split cell C

j k f g l d a b

c ei h

X 25, Y 300

X 50, Y 200

X 75, Y 100

Nearest Neighbor Search

Quadtree -Query

X1,Y1 P≥X1P≥Y1

P<X1P<Y1

P≥X1P<Y1

P<X1P≥Y1

Quadtree- Query

In many cases works

X1,Y1P<X1P<Y1 P<X1

P≥Y1

P≥X1P≥Y1

P≥X1P<Y1

Quadtree– Pitfall 1

In some cases doesn’t: there could be points in adjacent buckets that are closer

X1,Y1P≥X1P≥Y1

P<X1P<Y1 P≥X1

P<Y1P<X1P≥Y1

Quadtree – Pitfall 2

Could result in Query time Exponential in dimensions

Simple data structure. Versatile, easy to implement. Often space and time inefficient.

Quadtree

kd-trees (k-dimensional trees) Main ideas:

one-dimensional splits instead of splitting in the middle, choose

the split “carefully” (many variations) nearest neighbor queries same as for quad-

2-dimensional kd-trees

Algorithm Choose x or y coordinate (alternate between them). Choose the median of the coordinate

this defines a horizontal or vertical line. Recurse on both sides until there is only one point

left, which is stored as a leaf.

We get a binary tree Size O(n). Construction time O(nlogn). Depth O(logn).

Nearest Neighbor with KD Trees

We traverse the tree looking for the nearest neighbor of the query point.

Examine nearby points first: Explore the branch of the tree that is closest to the query point first.

When we reach a leaf node: compute the distance to each point in the node.

Then we can backtrack and try the other branch at each node visited.

Each time a new closest node is found, we can update the distance bounds.

Using the distance bounds and the bounds of the data below each node, we can prune parts of the tree that could NOT include the nearest neighbor.

Summary of K-Nearest Neighbor

Stores all training data in memory – large space requirement

Can improve query time by representing the data within a k-d tree

K-d trees are only efficient when there are many more examples than dimensions, preferably at least examples for dimensions

DECISION TREES. Decision trees One possible representation for hypotheses.

Documents

Transcript of DECISION TREES. Decision trees One possible representation for hypotheses.

part 1: decision trees Paul Seyfert - uni-heidelberg.depseyfert/BDT_C14H.pdf · boosted decision trees part 1: decision trees Paul Seyfert ... Gini-index: (Corrado Gini 1912 ... decision

Multivariate decision trees - SpringerMultivariate decision trees alleviate the replication problems of univariate decision trees. In a multivariate decision tree, each test can be

Decision Analysis-Decision Trees

Decision Trees - cw.fel.cvut.czcw.fel.cvut.cz/.../lectures/dectrees-slides.pdf · Decision Trees •What is a decision tree? •Attribute description •Expressiveness of decision

Decision Trees Exercise

lecture11 - Massachusetts Institute of Technology David&Sontag& New&York&University& Slides adapted from Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore Hypotheses: decision trees

Gradient Boosted Decision Trees on Hadoop - LCCClccc.eecs.berkeley.edu/Slides/YeChChZh10_slides.pdfGradient Boosted Decision Trees on Hadoop ... Gradient Boosted Decision Trees was

Decision Trees Jeff Storey. Overview What is a Decision Tree Sample Decision Trees How to Construct a Decision Tree Problems with Decision Trees Decision.

Decision Trees for Decision Making1

Approach based on Decision Trees - Computer Action Teamweb.cecs.pdx.edu/~mperkows/CLASS_479/LECTURES479/PE011..pdf · possible decision trees • Remember, decision trees represent

Decision Trees and Decision Tables

Lecture 23: Decision Trees Decision trees

Decision Trees - Oregon State Universityweb.engr.oregonstate.edu/~xfern/classes/cs534/notes/decision-tree... · Decision Tree Decision Boundaries † Decision Trees divide the feature

Decision Trees

Chapter 7 Decision Tree. Data Warehouse and Data Mining Chapter 6 2 Decision Trees An Algorithm for Building Decision Trees Decision Trees An Algorithm.

Decision Trees -

002.decision trees

Near Optimal Bayesian Active Learning for Decision Making · Presentation. Decision Making Setting Hypotheses Prior probability distribution over the set of hypotheses Tests Decision

Machine Learning - School of Computingzhe/pdf/lec-3-decision-trees-representation... · constructa decision tree that represents it 14. What are decision trees? • Decision trees

Balancing Decision-making Errors when Testing Hypotheses ...