Sequential Genetic Search for Ensemble Feature Selection Alexey Tsymbal, Padraig Cunningham...

30
Sequential Genetic Search Sequential Genetic Search for Ensemble Feature for Ensemble Feature Selection Selection Alexey Tsymbal, Padraig Cunningham Department of Computer Science Trinity College Dublin Ireland Mykola Pechenizkiy Department of Computer Science University of Jyväskylä Finland IJCAI’2005, Edinburgh, Scotland August 1-5, 2005
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    219
  • download

    0

Transcript of Sequential Genetic Search for Ensemble Feature Selection Alexey Tsymbal, Padraig Cunningham...

Page 1: Sequential Genetic Search for Ensemble Feature Selection Alexey Tsymbal, Padraig Cunningham Department of Computer Science Trinity College Dublin Ireland.

Sequential Genetic Search for Sequential Genetic Search for Ensemble Feature SelectionEnsemble Feature Selection

Alexey Tsymbal, Padraig CunninghamDepartment of Computer Science

Trinity College DublinIreland

Mykola PechenizkiyDepartment of Computer Science

University of Jyväskylä Finland

IJCAI’2005, Edinburgh, Scotland August 1-5, 2005

Page 2: Sequential Genetic Search for Ensemble Feature Selection Alexey Tsymbal, Padraig Cunningham Department of Computer Science Trinity College Dublin Ireland.

2

IJCAI’2005 Edinburgh, Scotland, August 1-5, 2005 Sequential Genetic Search for Ensemble Feature Selection by Tsymbal A., Pechenizkiy M.,Cunningham

P.

ContentsContents Introduction

– Classification and Ensemble Classification

Ensemble Feature Selection– strategies– sequential genetic search

Our GAS-SEFS strategy– Genetic Algorithm-based Sequential

Search for Ensemble Feature Selection Experiment design Experimental results Conclusions and future work

Page 3: Sequential Genetic Search for Ensemble Feature Selection Alexey Tsymbal, Padraig Cunningham Department of Computer Science Trinity College Dublin Ireland.

3

IJCAI’2005 Edinburgh, Scotland, August 1-5, 2005 Sequential Genetic Search for Ensemble Feature Selection by Tsymbal A., Pechenizkiy M.,Cunningham

P.

CLASSIFICATIONCLASSIFICATION

New instance to be classified

Class Membership ofthe new instance

J classes, n training observations, p features

Given n training instances

(xi, yi) where xi are values of

attributes and y is class

Goal: given new x0,

predict class y0

Training Set

The Task of ClassificationThe Task of Classification

Examples:

- prognostics of recurrence of breast

cancer;

- diagnosis of thyroid diseases;

- antibiotic resistance prediction

Page 4: Sequential Genetic Search for Ensemble Feature Selection Alexey Tsymbal, Padraig Cunningham Department of Computer Science Trinity College Dublin Ireland.

4

IJCAI’2005 Edinburgh, Scotland, August 1-5, 2005 Sequential Genetic Search for Ensemble Feature Selection by Tsymbal A., Pechenizkiy M.,Cunningham

P.

Ensemble classificationEnsemble classification

Application

T

T1 T2 … TS

(x, ?) h* = F(h1, h2, …, hS)

(x, y*)

Learning

h1 h2 … hS How to prepare inputs for generation of the base classifiers?

Page 5: Sequential Genetic Search for Ensemble Feature Selection Alexey Tsymbal, Padraig Cunningham Department of Computer Science Trinity College Dublin Ireland.

5

IJCAI’2005 Edinburgh, Scotland, August 1-5, 2005 Sequential Genetic Search for Ensemble Feature Selection by Tsymbal A., Pechenizkiy M.,Cunningham

P.

Ensemble classificationEnsemble classification

Application

T

T1 T2 … TS

(x, ?) h* = F(h1, h2, …, hS)

(x, y*)

Learning

h1 h2 … hS

How to combine the predictions of the base classifiers?

Page 6: Sequential Genetic Search for Ensemble Feature Selection Alexey Tsymbal, Padraig Cunningham Department of Computer Science Trinity College Dublin Ireland.

6

IJCAI’2005 Edinburgh, Scotland, August 1-5, 2005 Sequential Genetic Search for Ensemble Feature Selection by Tsymbal A., Pechenizkiy M.,Cunningham

P.

Ensemble feature selectionEnsemble feature selection

How to prepare inputs for generation of the base classifiers ?– Sampling the training set– Manipulation of input features– Manipulation of output targets (class values)

Goal of traditional feature selection– find and remove features that are unhelpful or

misleading to learning (making one feature subset for single classifier)

Goal of ensemble feature selection– find and remove features that are unhelpful or

destructive to learning making different feature subsets for a number of classifiers

– find feature subsets that will promote diversity (disagreement) between classifiers

AEE

Page 7: Sequential Genetic Search for Ensemble Feature Selection Alexey Tsymbal, Padraig Cunningham Department of Computer Science Trinity College Dublin Ireland.

7

IJCAI’2005 Edinburgh, Scotland, August 1-5, 2005 Sequential Genetic Search for Ensemble Feature Selection by Tsymbal A., Pechenizkiy M.,Cunningham

P.

Search in EFSSearch in EFS

Search space:

2#Features * #Classifiers

Search strategies include: Ensemble Forward Sequential Selection (EFSS) Ensemble Backward Sequential Selection (EBSS) Hill-Climbing (HC) Random Subspacing Method (RSM) Genetic Ensemble Feature Selection (GEFS)

iii divaccFitness Fitness function:

Page 8: Sequential Genetic Search for Ensemble Feature Selection Alexey Tsymbal, Padraig Cunningham Department of Computer Science Trinity College Dublin Ireland.

8

IJCAI’2005 Edinburgh, Scotland, August 1-5, 2005 Sequential Genetic Search for Ensemble Feature Selection by Tsymbal A., Pechenizkiy M.,Cunningham

P.

Measuring DiversityMeasuring Diversity

The kappa statistic:

00011011

1001

_NNNN

NNdisdiv i,j

2

21

1_

i,jkappadiv

N

Nl

iii

11

l

iNN

NN ii

12

**

The fail/non-fail disagreement measure : the percentage of test

instances for which the classifiers make different predictions but

for which one of them is correct:

Page 9: Sequential Genetic Search for Ensemble Feature Selection Alexey Tsymbal, Padraig Cunningham Department of Computer Science Trinity College Dublin Ireland.

9

IJCAI’2005 Edinburgh, Scotland, August 1-5, 2005 Sequential Genetic Search for Ensemble Feature Selection by Tsymbal A., Pechenizkiy M.,Cunningham

P.

Random Subspace MethodRandom Subspace Method

RSM itself is simple but effective technique for EFS – the lack of accuracy in the ensemble members is

compensated for by their diversity

– does not suffer from the curse of dimensionality

– RS is used as a base in other EFS strategies, including Genetic Ensemble Feature Selection.

Generation of initial feature subsets using (RSM) A number of refining passes on each feature set

while there is improvement in fitness

Page 10: Sequential Genetic Search for Ensemble Feature Selection Alexey Tsymbal, Padraig Cunningham Department of Computer Science Trinity College Dublin Ireland.

10

IJCAI’2005 Edinburgh, Scotland, August 1-5, 2005 Sequential Genetic Search for Ensemble Feature Selection by Tsymbal A., Pechenizkiy M.,Cunningham

P.

Genetic Ensemble Feature Genetic Ensemble Feature SelectionSelection

Genetic search – important direction in FS research

– GA as effective global optimization technique GA for EFS:

– Kuncheva, 1993: Ensemble accuracy instead of accuracies of base classifiers

• Fitness function is biased towards particular integration method

• Preventive measures to avoid overfitting

– Alternative: use of individual accuracy and diversity• Overfitting of individual is more desirable than overfitting of

ensemble

– Opitz, 1999: Explicitly used diversity in fitness function • RSM for initial population

• New candidates by crossover and mutation

• Roulette-wheel selection (p proportional to fitness)

Page 11: Sequential Genetic Search for Ensemble Feature Selection Alexey Tsymbal, Padraig Cunningham Department of Computer Science Trinity College Dublin Ireland.

11

IJCAI’2005 Edinburgh, Scotland, August 1-5, 2005 Sequential Genetic Search for Ensemble Feature Selection by Tsymbal A., Pechenizkiy M.,Cunningham

P.

Genetic Ensemble Feature SelectionGenetic Ensemble Feature Selection

recombination

10001

01011

10011

01001

mutation

x

f

phenotype space00111 11001

10001

01011

population of genotypes (base classifiers)

coding scheme

10111

01001

10

01

001

011

10

01 001

01110011

01001fitness

selection

11001

10001

01011

10001

Current ensemble of base classifiers

Page 12: Sequential Genetic Search for Ensemble Feature Selection Alexey Tsymbal, Padraig Cunningham Department of Computer Science Trinity College Dublin Ireland.

12

IJCAI’2005 Edinburgh, Scotland, August 1-5, 2005 Sequential Genetic Search for Ensemble Feature Selection by Tsymbal A., Pechenizkiy M.,Cunningham

P.

Basic Idea behind GA for EFSBasic Idea behind GA for EFS

Ensemble(generation)

BC1

BCi

BCEns.

Size

RSM

GA

Current Population(diversity)

New Population(fitness)

init

iii divaccFitness

Page 13: Sequential Genetic Search for Ensemble Feature Selection Alexey Tsymbal, Padraig Cunningham Department of Computer Science Trinity College Dublin Ireland.

13

IJCAI’2005 Edinburgh, Scotland, August 1-5, 2005 Sequential Genetic Search for Ensemble Feature Selection by Tsymbal A., Pechenizkiy M.,Cunningham

P.

Basic Idea behind GAS-SEFSBasic Idea behind GAS-SEFS

Ensemble

BC1

BCi

BCi+1

GAi+1

BCi+1

diversity

RSMGeneration

Current Population(accuracies)

New Population(fitness)

new BC (fitness)

init

iii divaccFitness

Page 14: Sequential Genetic Search for Ensemble Feature Selection Alexey Tsymbal, Padraig Cunningham Department of Computer Science Trinity College Dublin Ireland.

14

IJCAI’2005 Edinburgh, Scotland, August 1-5, 2005 Sequential Genetic Search for Ensemble Feature Selection by Tsymbal A., Pechenizkiy M.,Cunningham

P.

GAS-SEFS 1 of 2GAS-SEFS 1 of 2

GAS-SEFS (GGenetic AAlgorithm-based SSequential SSearch for EEnsemble FFeature SSelection) – instead of maintaining a set of feature subsets in each

generation like in GA, consists in applying a series of genetic processes, one for each base classifier, sequentially.

– After each genetic process one base classifier is selected into the ensemble.

– GAS-SEFS uses the same fitness function, but

• diversity is calculated with the base classifiers already formed by previous genetic processes

• In the first GA process – accuracy only.

– GAS-SEFS uses the same genetic operators as GA.

Page 15: Sequential Genetic Search for Ensemble Feature Selection Alexey Tsymbal, Padraig Cunningham Department of Computer Science Trinity College Dublin Ireland.

15

IJCAI’2005 Edinburgh, Scotland, August 1-5, 2005 Sequential Genetic Search for Ensemble Feature Selection by Tsymbal A., Pechenizkiy M.,Cunningham

P.

GAS-SEFS 2 of 2GAS-SEFS 2 of 2

GA and GAS-SEFS peculiarities:– Full feature sets are not allowed in RS – The crossover operator may not produce a full feature

subset. – Individuals for crossover are selected randomly

proportional to log(1+fitness) instead of just fitness– The generation of children identical to their parents is

prohibited.– To provide a better diversity in the length of feature

subsets, two different mutation operators are used • Mutate1_0 deletes features randomly with a given

probability;• Mutate0_1 adds features randomly with a given

probability.

Page 16: Sequential Genetic Search for Ensemble Feature Selection Alexey Tsymbal, Padraig Cunningham Department of Computer Science Trinity College Dublin Ireland.

16

IJCAI’2005 Edinburgh, Scotland, August 1-5, 2005 Sequential Genetic Search for Ensemble Feature Selection by Tsymbal A., Pechenizkiy M.,Cunningham

P.

Complexity of GA-based search does not depend on the #features

GAS-SEFS: GA:

where S is the number of base classifiers, S’ is the number of individuals (feature subsets) in one generation, and Ngen is the number of generations.

EFSS and EBSS:

where S is the number of base classifiers, N is the total number of features, and N’ is the number of features included or deleted on average in an FSS or BSS search.

Computational complexityComputational complexity

Page 17: Sequential Genetic Search for Ensemble Feature Selection Alexey Tsymbal, Padraig Cunningham Department of Computer Science Trinity College Dublin Ireland.

17

IJCAI’2005 Edinburgh, Scotland, August 1-5, 2005 Sequential Genetic Search for Ensemble Feature Selection by Tsymbal A., Pechenizkiy M.,Cunningham

P.

Integration of classifiersIntegration of classifiers

Motivation for the Dynamic Integration:

Each classifier is best in some sub-areas of the whole data

set, where its local error is comparatively less than the

corresponding errors of the other classifiers.

Static Dynamic

Selection/Combination

Dynamic Votingwith Selection (DVS)

Weighted Voting (WV)

Dynamic Selection (DS)

Static Selection (CVM)

Page 18: Sequential Genetic Search for Ensemble Feature Selection Alexey Tsymbal, Padraig Cunningham Department of Computer Science Trinity College Dublin Ireland.

18

IJCAI’2005 Edinburgh, Scotland, August 1-5, 2005 Sequential Genetic Search for Ensemble Feature Selection by Tsymbal A., Pechenizkiy M.,Cunningham

P.

Experimental DesignExperimental Design Parameter settings for GA and GAS-SEFS:

– a mutation rate - 50%; – a population size – 10; – a search length of 40 feature subsets/individuals:

• 20 are offsprings of the current population of 10 classifiers generated by crossover,

• 20 are mutated offsprings (10 with each mutation operator).

– 10 generations of individuals were produced; – 400 (GA) and 4000 (GAS-SEFS) feature subsets.

To evaluate GA and GAS-SEFS:– 5 integration methods– Simple Bayes as Base Classifier – stratified random-sampling with 60%/20%/20% of

instances in the training/validation/test set;– 70 test runs on each of 21 UCI data set for each strategy

and diversity.

Page 19: Sequential Genetic Search for Ensemble Feature Selection Alexey Tsymbal, Padraig Cunningham Department of Computer Science Trinity College Dublin Ireland.

19

IJCAI’2005 Edinburgh, Scotland, August 1-5, 2005 Sequential Genetic Search for Ensemble Feature Selection by Tsymbal A., Pechenizkiy M.,Cunningham

P.

GA vs GAS-SEFS on two groups of GA vs GAS-SEFS on two groups of datasetsdatasets

Ensemble accuracies for GA and GAS-SEFS on two groups of data sets (1): < 9 and (2) >= 9 features with four ensemble sizes

0.810

0.815

0.820

0.825

0.830

0.835

0.840

GA_gr1 GAS-SEFS_gr1 GA_gr2 GAS-SEFS_gr2

3 5 7 10

DVS

F/N-F disagreement

EnsembleSize

Page 20: Sequential Genetic Search for Ensemble Feature Selection Alexey Tsymbal, Padraig Cunningham Department of Computer Science Trinity College Dublin Ireland.

20

IJCAI’2005 Edinburgh, Scotland, August 1-5, 2005 Sequential Genetic Search for Ensemble Feature Selection by Tsymbal A., Pechenizkiy M.,Cunningham

P.

GA vs GAS-SEFS for Five Integration GA vs GAS-SEFS for Five Integration MethodsMethods

0.65

0.70

0.75

0.80

0.85

0.90

0.95

SS WV DS DV DVS

GA

GAS-SEFS

EnsembleSize = 10

Ensemble accuracies for five integration methods on Tic-Tac-Toe

Page 21: Sequential Genetic Search for Ensemble Feature Selection Alexey Tsymbal, Padraig Cunningham Department of Computer Science Trinity College Dublin Ireland.

21

IJCAI’2005 Edinburgh, Scotland, August 1-5, 2005 Sequential Genetic Search for Ensemble Feature Selection by Tsymbal A., Pechenizkiy M.,Cunningham

P.

Conclusions and Future WorkConclusions and Future Work Diversity in ensemble of classifiers is very important We have considered two genetic search strategies for EFS. The new strategy, GAS-SEFS, consists in employing a series

of genetic search processes – one for each base classifier.

GAS-SEFS results in better ensembles having greater accuracy– especially for data sets with relatively larger numbers of

features. – one reason – each of the core GA processes leads to significant

overfitting of a corresponding ensemble member GAS-SEFS is significantly more time-consuming than GA.

– GAS-SEFS = ensemble_size * GA [Oliveira et al., 2003] better results for single FSS based on

Pareto-front dominating solutions. – Adaptation of this technique to EFS is an interesting topic for

further research.

Page 22: Sequential Genetic Search for Ensemble Feature Selection Alexey Tsymbal, Padraig Cunningham Department of Computer Science Trinity College Dublin Ireland.

22

IJCAI’2005 Edinburgh, Scotland, August 1-5, 2005 Sequential Genetic Search for Ensemble Feature Selection by Tsymbal A., Pechenizkiy M.,Cunningham

P.

Thank you!Thank you!

Alexey Tsymbal, Padraig CunninghamDept of Computer Science

Trinity College DublinIreland

[email protected], [email protected]

Mykola PechenizkiyDepartment of Computer Science

and Information Systems University of Jyväskylä Finland

[email protected]

Page 23: Sequential Genetic Search for Ensemble Feature Selection Alexey Tsymbal, Padraig Cunningham Department of Computer Science Trinity College Dublin Ireland.

Additional SlidesAdditional Slides

Page 24: Sequential Genetic Search for Ensemble Feature Selection Alexey Tsymbal, Padraig Cunningham Department of Computer Science Trinity College Dublin Ireland.

24

IJCAI’2005 Edinburgh, Scotland, August 1-5, 2005 Sequential Genetic Search for Ensemble Feature Selection by Tsymbal A., Pechenizkiy M.,Cunningham

P.

ReferencesReferences

• [Kuncheva, 1993] Ludmila I. Kuncheva. Genetic algorithm for feature selection for parallel classifiers, Information Processing Letters 46: 163-168, 1993.

• [Kuncheva and Jain, 2000] Ludmila I. Kuncheva and Lakhmi C. Jain. Designing classifier fusion systems by genetic algorithms, IEEE Transactions on Evolutionary Computation 4(4): 327-336, 2000.

• [Oliveira et al., 2003] Luiz S. Oliveira, Robert Sabourin, Flavio Bortolozzi, and Ching Y. Suen. A methodology for feature selection using multi-objective genetic algorithms for handwritten digit string recognition, Pattern Recognition and Artificial Intelligence 17(6): 903-930, 2003.

• [Opitz, 1999] David Opitz. Feature selection for ensembles. In Proceedings of the 16th National Conference on Artificial Intelligence, pages 379-384, 1999, AAAI Press.

Page 25: Sequential Genetic Search for Ensemble Feature Selection Alexey Tsymbal, Padraig Cunningham Department of Computer Science Trinity College Dublin Ireland.

25

IJCAI’2005 Edinburgh, Scotland, August 1-5, 2005 Sequential Genetic Search for Ensemble Feature Selection by Tsymbal A., Pechenizkiy M.,Cunningham

P.

GAS-SEFS AlgorithmGAS-SEFS Algorithm

Page 26: Sequential Genetic Search for Ensemble Feature Selection Alexey Tsymbal, Padraig Cunningham Department of Computer Science Trinity College Dublin Ireland.

26

IJCAI’2005 Edinburgh, Scotland, August 1-5, 2005 Sequential Genetic Search for Ensemble Feature Selection by Tsymbal A., Pechenizkiy M.,Cunningham

P.

Other interesting findingsOther interesting findings• alpha

– were different for different data sets, – for both GA and GAS-SEFS, alpha for the dynamic integration

methods is bigger than for the static ones (2.2 vs 0.8 on average).

– GAS-SEFS needs slightly higher values of alpha than GA (1.8 vs 1.5 on average).

• GAS-SEFS always starts with a classifier, which is based on accuracy only, and the subsequent classifiers need more diversity than accuracy.

• # of selected features falls as the ensemble size grows, – this is especially clear for GAS-SEFS, as the base classifiers need more

diversity.

• integration methods (for both GA and GAS-SEFS):– the static, SS and WV, and the dynamic DS start to overfit the validation

set already after 5 generations and show lower accuracies, – accuracies of DV and DVS continue to grow up to 10 generations.

Page 27: Sequential Genetic Search for Ensemble Feature Selection Alexey Tsymbal, Padraig Cunningham Department of Computer Science Trinity College Dublin Ireland.

27

IJCAI’2005 Edinburgh, Scotland, August 1-5, 2005 Sequential Genetic Search for Ensemble Feature Selection by Tsymbal A., Pechenizkiy M.,Cunningham

P.

Paper SummaryPaper Summary

0.810

0.815

0.820

0.825

0.830

0.835

0.840

GA_gr1 GAS-SEFS_gr1 GA_gr2 GAS-SEFS_gr2

3 5 7 10

• New strategy for genetic ensemble feature selection, GAS-SEFS, is introduced

• In contrast with previously considered algorithm (GA), it is sequential; a serious of genetic processes for each base classifier

• More time-consuming, but with better accuracy• Each base classifier has a considerable level of overfitting with

GAS-SEFS, but the ensemble accuracy grows• Experimental comparisons demonstrate clear superiority on 21 UCI

datasets, especially for datasets with many features (gr1 vs gr2)

Page 28: Sequential Genetic Search for Ensemble Feature Selection Alexey Tsymbal, Padraig Cunningham Department of Computer Science Trinity College Dublin Ireland.

28

IJCAI’2005 Edinburgh, Scotland, August 1-5, 2005 Sequential Genetic Search for Ensemble Feature Selection by Tsymbal A., Pechenizkiy M.,Cunningham

P.

Simple Bayes as Base ClassifierSimple Bayes as Base Classifier

Bayes theorem:

P(C|X) = P(X|C)·P(C) / P(X)

Naïve assumption: attribute independence

P(x1,…,xk|C) = P(x1|C)·…·P(xk|C)

If i-th attribute is categorical:P(xi|C) is estimated as the relative freq of samples having value xi as i-th attribute in class C

If i-th attribute is continuous:P(xi|C) is estimated thru a Gaussian density function

Computationally easy in both cases

Page 29: Sequential Genetic Search for Ensemble Feature Selection Alexey Tsymbal, Padraig Cunningham Department of Computer Science Trinity College Dublin Ireland.

29

IJCAI’2005 Edinburgh, Scotland, August 1-5, 2005 Sequential Genetic Search for Ensemble Feature Selection by Tsymbal A., Pechenizkiy M.,Cunningham

P.

Dataset’s characteristicsDataset’s characteristicsData set Instances Classes

Features

Categ. Num.

Balance 625 3 0 4

Breast Cancer 286 2 9 0

Car 1728 4 6 0

Diabetes 768 2 0 8

Glass Recognition 214 6 0 9

Heart Disease 270 2 0 13

Ionosphere 351 2 0 34

Iris Plants 150 3 0 4

LED 300 10 7 0

LED17 300 10 24 0

Liver Disorders 345 2 0 6

Lymphography 148 4 15 3

MONK-1 432 2 6 0

MONK-2 432 2 6 0

MONK-3 432 2 6 0

Soybean 47 4 0 35

Thyroid 215 3 0 5

Tic-Tac-Toe 958 2 9 0

Vehicle 846 4 0 18

Voting 435 2 16 0

Zoo 101 7 16 0

Page 30: Sequential Genetic Search for Ensemble Feature Selection Alexey Tsymbal, Padraig Cunningham Department of Computer Science Trinity College Dublin Ireland.

30

IJCAI’2005 Edinburgh, Scotland, August 1-5, 2005 Sequential Genetic Search for Ensemble Feature Selection by Tsymbal A., Pechenizkiy M.,Cunningham

P.

GA vs GAS-SEFS for Five Integration GA vs GAS-SEFS for Five Integration MethodsMethods

0.65

0.70

0.75

0.80

0.85

0.90

0.95

SS WV DS DV DVS SS WV DS DV DVS

3

5

7

10

Ensemble accuracies for GA (left) and GAS-SEFS (right) for five integration methods and four ensemble sizes on Tic-Tac-Toe

EnsembleSize