GAs and Feature Weighting

25
1 GAs and Feature Weighting Rebecca Fiebrink MUMT 611 31 March 2005

description

GAs and Feature Weighting. Rebecca Fiebrink MUMT 611 31 March 2005. Outline. Genetic algorithms History of GAs How GAs work Simple model Variations Feature selection and weighting Feature selection Feature weighting Using GAs for feature weighting Siedlecki and Sklansky Punch - PowerPoint PPT Presentation

Transcript of GAs and Feature Weighting

Page 1: GAs and Feature Weighting

1

GAs and Feature Weighting

Rebecca FiebrinkMUMT 611

31 March 2005

Page 2: GAs and Feature Weighting

2 of 25

Outline

• Genetic algorithms– History of GAs– How GAs work

• Simple model• Variations

• Feature selection and weighting– Feature selection– Feature weighting

• Using GAs for feature weighting– Siedlecki and Sklansky– Punch– The ACE project

Page 3: GAs and Feature Weighting

3

Part 1:Genetic Algorithms

Page 4: GAs and Feature Weighting

4 of 25

History of GAs• 1859: Charles Darwin, Origin of Species• 1950’s – 60’s: Computers simulate evolution• 1960’s – 70’s: John Holland invents genetic

algorithms– Adaptation in Natural and Artificial Systems, book

published 1975– Context of “adaptive systems,” not just “optimization”

• 1980’s – Present: Further exploration, widespread adoptation– Kenneth De Jong– Goldberg: Genetic Algorithms in Search, Optimization, and

Machine Learning (“classic” textbook published 1989)• Related areas: evolutionary programming, genetic

programming

(Carnegie Mellon GA online FAQ, Cantu-Paz 2000)

Page 5: GAs and Feature Weighting

5 of 25

How GAs work

• Problem is a search space– Variable parameters dimensions– Problem has minima and maxima within this space– Minima and maxima are hard to find

• Potential solutions describe points in this space– It is “easy” to calculate “goodness” of any one solution and

compare it to other solutions

• Maintain a set of potential solutions– Guess a set of initial solutions– Combine these solutions with each other– Keep “good” solutions and discard “bad” solutions– Repeat combination and selection process for some time

Page 6: GAs and Feature Weighting

6 of 25

A simple GA

Start

Terminate?

Stop

SelectParents

ProduceOffspring

MutateOffspring

EvaluatePopulation

Page 7: GAs and Feature Weighting

7 of 25

GA terminology• Population: The set of all current potential solutions• Deme: A subset of the population that interbreeds (might be the

whole population)• Chromosome: A solution structure (often binary); contains one or

more genes• Gene: A feature or parameter• Allele: A specific value for a gene• Phenotype: A member of the population, a real-valued solution

vector• Generation: A complete cycle of evaluation, selection, crossover,

and mutation in a deme• Locus: A particular position (bit) on the chromosome string• Crossover: The process of combining two chromosomes; simplest

method is single-point crossover where two chromosomes swap parts on either side of a random locus

• Mutation: A random change to a phenotype (i.e., changing an allele)

Page 8: GAs and Feature Weighting

8 of 25

Crossover Illustrated

1011010111101 1101010010100

101101 0010100 110101 0111101

+Parent 1

Child 1

Parent 2

Child 2

Page 9: GAs and Feature Weighting

9 of 25

GA Variations

• Variables:– Population size– Selection method– Crossover method– Mutation rate– How to encode a solution in a chromosome

• No single choice is best for all problems (Wolpert & Macready 1997, cited in Cantu-Paz 2000).

Page 10: GAs and Feature Weighting

10 of 25

More variations: Parallel GAs

• Categories:– Single-population Master/Slave– Multiple population– Fine-grained– Hybrids

Master

Slaves

(Cantu-Paz 2000)

Page 11: GAs and Feature Weighting

11 of 25

Applications of GAs• NP problems (e.g., Traveling Salesman)• Airfoil design• Noise control• Fluid dynamics• Circuit partitioning• Image processing• Liquid crystals• Water networks• Music

(Miettinen et al. 1999, Coley 1999)

Page 12: GAs and Feature Weighting

12

Part 2:Feature Selection and

Weighting

Page 13: GAs and Feature Weighting

13 of 25

Features

• What are they?– A classifier’s “handle” to the problem– Numerical or binary values representing

independent or dependent attributes of each instance to be classified

• What are they in music?– Spectral centroid– Number of instruments– Composer– Presence of Banjo– Beat histogram- …

Page 14: GAs and Feature Weighting

14 of 25

Feature Selection – Why?

• “Curse of dimensionality”:– The size of the training set must grow

exponentially with the dimensionality of the feature space

• The set of best features to use may vary according to classification scheme, goal, or data set

• We need a way to select only a good subset of all possible features– May or may not be interested in “best” subset

Page 15: GAs and Feature Weighting

15 of 25

Feature Selection – How?• Common approaches

– Dimensionality reduction (principle component analysis or factor analysis)

– Experimentation: Choose a subset, train a classifier with it, and evaluate its performance.

• Example: <centroid, length, avg_dynamic>– A piece might have vector <3.422, 523, 98> (values)– Try selections <1, 1, 0>, <0, 1, 1> … ? (1=yes, 0=no)– Which works best?

• Setbacks in experimentation:– For n potential features, there are 2n possible subsets: a

huge search space!– Evaluating each classifier takes time– Choose vectors using sequential search, branch, and

bound, or GAs

Page 16: GAs and Feature Weighting

16 of 25

Feature Weighting

• Among a group of selected (useful) features, some may be more useful than others

• Weight the features for optimal classifier performance

• Experimental approach: Similar to feature selection– Example: Say <1, 0, 1> is optimal selection for <centroid, length, avg_dynamic> feature set

• Try weights like <.5, 0, 1>, <1, 0, .234983>, … ?

• Practical and theoretical constraints of feature selection are magnified

Page 17: GAs and Feature Weighting

17

Part 3: Using GAs for Feature

Weighting

Page 18: GAs and Feature Weighting

18 of 25

GAs and Feature Weighting

• A chromosome is a vector of weights– Each gene corresponds to a feature

weight• E.g., <1, 0, 1, 0, 1, 1, 1> for selection• E.g., <.3, 0, .89, 0, .39, .91, .03> for weighting

• The fitness of a chromosome is a measure of its performance training and validating an actual classifier

Page 19: GAs and Feature Weighting

19 of 25

Siedlecki and Sklansky

• 1989: A note on genetic algorithms for large-scale feature selection

• Choose feature subset for Knn classifier• Propose using GAs when feature set > 20• Binary chromosomes (0/1 for each feature)• Goal: seek the smallest/least costly subset

of features for which classifier performance above a certain level

Page 20: GAs and Feature Weighting

20 of 25

Results

• Compared GAs with exhaustive search, branch and bound, and (p,q)-search on sample problem– Branch and bound and (p, q)-search did not come close

enough (within 1% error) to finding a set that performed optimally

– Exhaustive search used 224 evaluations– GA found optimal or near-optimal solutions with 1000-

3000 evaluations, a huge savings over both exhaustive search and branch and bound

• GA exhibits slightly more than linear increase of time complexity with added features

• GA also outperforms other methods on a real problem with 30 features

Page 21: GAs and Feature Weighting

21 of 25

Punch et al.

• 1993: Further research on feature selection and classification using genetic algorithms

• Approach: Use GAs for feature weighting for Knn classifier: classifier “dimension warping”

• Each chromosome is a vector of real-valued weights– Mapped exponentially to range [.01, 10) or 0

• Also experimented with “hidden features” representing combination (multiplication) of 2 other features

Page 22: GAs and Feature Weighting

22 of 25

Results

• Feature weighting can outperform simple selection, especially on large data set with noise– Best performance when feature weighting

follows binary selection

• 14 days to compute results• Parallel version: nearly linear speedup

when strings passed to other processors for fitness evaluation

Page 23: GAs and Feature Weighting

23 of 25

Recent work

• Minaei-Bidgoli, Kortemeyer, and Punch, 2004: Optimizing classification ensembles via a genetic algorithm for a web-based educational system– Incorporate GA feature weighting in a

multiple-classifier system (vote system)– Results show over 10% improvement in

accuracy of the classifier ensemble when GA optimization is used

Page 24: GAs and Feature Weighting

24 of 25

MIR Application

• ACE project– Accept arbitrary type and number of

features, arbitrary taxonomy, training and testing data, and a target classification time

– Classify data using best means available; e.g., make use of multiple classifiers, feature weighting

– Parallel GAs for feature weighting can improve solution time and quality

Page 25: GAs and Feature Weighting

25 of 25

Conclusions

• GAs are powerful and interesting tools, but they are complex and not always well-understood

• Feature selection/weighting involves selecting from a huge space of possibilities, but it is a necessary task

• GAs are useful for tackling the feature weighting problem

• Faster machines, parallel computing, and better understanding of GA behavior can all benefit the feature weighting problem

• This is relevant to MIR: we want to do good and efficient classification!