Comparing Univariate and Multivariate Decision Trees Olcay Taner Yıldız Ethem Alpaydın Department...

12
Comparing Univariate and Multivariate Decision Trees Olcay Taner Yıldız Ethem Alpaydın Department of Computer Engineering Bogazici University E-mail: [email protected]

Transcript of Comparing Univariate and Multivariate Decision Trees Olcay Taner Yıldız Ethem Alpaydın Department...

Page 1: Comparing Univariate and Multivariate Decision Trees Olcay Taner Yıldız Ethem Alpaydın Department of Computer Engineering Bogazici University E-mail:

Comparing Univariate and Multivariate Decision Trees

Olcay Taner Yıldız

Ethem AlpaydınDepartment of Computer Engineering

Bogazici University

E-mail: [email protected]

Page 2: Comparing Univariate and Multivariate Decision Trees Olcay Taner Yıldız Ethem Alpaydın Department of Computer Engineering Bogazici University E-mail:

Univariate Trees (ID3)• Constructs decision trees

top-down manner.• Select the best attribute to

test at the root node by using a statistical test.

• Descendants of the root node are created for each possible value of the attribute. Two for numeric attributes as xi< a and xi> a, m for symbolic attributes as xi = ak, k = 1, …, m.

Page 3: Comparing Univariate and Multivariate Decision Trees Olcay Taner Yıldız Ethem Alpaydın Department of Computer Engineering Bogazici University E-mail:

• Partition Merit Criteria– Information Gain Entropy = Sumi(pilogpi)

– Weak Theory Learning Measure– Gini Index

• Avoiding Overfitting– Pre-pruning– Post-pruning

ID3 Continued

Page 4: Comparing Univariate and Multivariate Decision Trees Olcay Taner Yıldız Ethem Alpaydın Department of Computer Engineering Bogazici University E-mail:

Univariate versus Multivariate

Page 5: Comparing Univariate and Multivariate Decision Trees Olcay Taner Yıldız Ethem Alpaydın Department of Computer Engineering Bogazici University E-mail:

Classification and Regression Trees (CART)

• Each instance is first normalized.

• Algorithm takes a set of coefficients W=(w1,…, wn) and searches for the best split of the form v=Sumi(wixi) c for i=1 to n.

• Algorithm cycles through the attributes x1,…, xn at each step doing a search for an improved split.

• At each cycle CART searches for the best split of the form v-(xi+ ) c. The search for is carried out for = -0.25, 0.0, 0.25.

• Best of and are used to update linear combination.

Page 6: Comparing Univariate and Multivariate Decision Trees Olcay Taner Yıldız Ethem Alpaydın Department of Computer Engineering Bogazici University E-mail:

CART continued

• Univariate vs Multivariate Splits

• Symbolic and Numeric Features conversionColor: (red, green, blue)

red: 100 green:010 blue:001

• Feature Selection– The most important single variable is the one whose

deletion causes the greatest deterioration.

Page 7: Comparing Univariate and Multivariate Decision Trees Olcay Taner Yıldız Ethem Alpaydın Department of Computer Engineering Bogazici University E-mail:

Accuracy Comparison

0.00

20.00

40.00

60.00

80.00

100.00

120.00

BRE

BUP

CAR

DER

ECO

FLA

GLA

HEP IR

IIRO

MON SE

GVOT

WIN

ZOO

Dataset

Acc

ura

cy NM

ID3

FSCART

Page 8: Comparing Univariate and Multivariate Decision Trees Olcay Taner Yıldız Ethem Alpaydın Department of Computer Engineering Bogazici University E-mail:

Accuracy ID3>FSCART

0.00

10.00

20.00

30.00

40.00

50.00

60.00

70.00

80.00

90.00

100.00

BUP DER ECO GLA IRI IRO SEG VOT ZOO

Dataset

Acc

ura

cy ID3

FSCART

Page 9: Comparing Univariate and Multivariate Decision Trees Olcay Taner Yıldız Ethem Alpaydın Department of Computer Engineering Bogazici University E-mail:

Accuracy FSCART>ID3

0.00

20.00

40.00

60.00

80.00

100.00

120.00

BRE CAR FLA HEP MON WIN

Dataset

Acc

ura

cy ID3

FSCART

Page 10: Comparing Univariate and Multivariate Decision Trees Olcay Taner Yıldız Ethem Alpaydın Department of Computer Engineering Bogazici University E-mail:

Conclusions for ID3

• For three partition merit criteria (Entropy, Weak Theory Learning Measure, Gini Index) there is no significant difference in accuracy, node size and learning time difference between them.

• Pruning increases accuracy and post-pruning is better than pre-pruning in case of accuracy and node size at the expense of more computation time.

Page 11: Comparing Univariate and Multivariate Decision Trees Olcay Taner Yıldız Ethem Alpaydın Department of Computer Engineering Bogazici University E-mail:

Conclusions for CART

• When feature selection is applied, CART accuracy is statistically significantly increased and node size is decreased in 13 datasets out of 15.

• Multivariate method CART does not always increase accuracy and does not always lower node size.

Page 12: Comparing Univariate and Multivariate Decision Trees Olcay Taner Yıldız Ethem Alpaydın Department of Computer Engineering Bogazici University E-mail:

Questions