Genetic Algorithm for Variable Selection Jennifer Pittman ISDS Duke University.
-
Upload
essence-withem -
Category
Documents
-
view
227 -
download
0
Transcript of Genetic Algorithm for Variable Selection Jennifer Pittman ISDS Duke University.
![Page 1: Genetic Algorithm for Variable Selection Jennifer Pittman ISDS Duke University.](https://reader035.fdocuments.net/reader035/viewer/2022062307/551bf4cb550346a34f8b45d8/html5/thumbnails/1.jpg)
Genetic Algorithm for Variable Selection
Jennifer PittmanISDS
Duke University
![Page 2: Genetic Algorithm for Variable Selection Jennifer Pittman ISDS Duke University.](https://reader035.fdocuments.net/reader035/viewer/2022062307/551bf4cb550346a34f8b45d8/html5/thumbnails/2.jpg)
Genetic Algorithms Step by Step
Jennifer PittmanISDS
Duke University
![Page 3: Genetic Algorithm for Variable Selection Jennifer Pittman ISDS Duke University.](https://reader035.fdocuments.net/reader035/viewer/2022062307/551bf4cb550346a34f8b45d8/html5/thumbnails/3.jpg)
Example: Protein Signature Selection in Mass Spectrometry
htt
p:/
/ww
w.u
ni-
main
z.d
e/~
frosc
00
0/f
bg_p
o3.h
tml
molecular weight
rela
tive in
tensi
ty
![Page 4: Genetic Algorithm for Variable Selection Jennifer Pittman ISDS Duke University.](https://reader035.fdocuments.net/reader035/viewer/2022062307/551bf4cb550346a34f8b45d8/html5/thumbnails/4.jpg)
Genetic Algorithm (Holland)
• heuristic method based on ‘ survival of the fittest ’
• in each iteration (generation) possible solutions or
individuals represented as strings of numbers
• useful when search space very large or too complex
for analytic treatment
00010101 00111010 1111000000010001 00111011 1010010100100100 10111001 01111000
11000101 01011000 01101010
3021 3058 3240
![Page 5: Genetic Algorithm for Variable Selection Jennifer Pittman ISDS Duke University.](https://reader035.fdocuments.net/reader035/viewer/2022062307/551bf4cb550346a34f8b45d8/html5/thumbnails/5.jpg)
Flowchart of GA
©
htt
p:/
/ww
w.s
pect
rosc
opyn
ow
.co
m • individuals allowed to
reproduce (selection),
crossover, mutate
• all individuals in population
evaluated by fitness function
![Page 6: Genetic Algorithm for Variable Selection Jennifer Pittman ISDS Duke University.](https://reader035.fdocuments.net/reader035/viewer/2022062307/551bf4cb550346a34f8b45d8/html5/thumbnails/6.jpg)
htt
p:/
/ib-p
ola
nd.v
irtu
ala
ve.n
et/
ee/g
en
eti
c1/3
gen
eti
calg
ori
thm
s.ht
m
![Page 7: Genetic Algorithm for Variable Selection Jennifer Pittman ISDS Duke University.](https://reader035.fdocuments.net/reader035/viewer/2022062307/551bf4cb550346a34f8b45d8/html5/thumbnails/7.jpg)
Initialization
• proteins corresponding to 256 mass spectrometry
values from 3000-3255 m/z • assume optimal signature contains 3 peptides
represented by their m/z values in binary encoding
• population size ~M=L/2 where L is signature length
(a simplified example)
![Page 8: Genetic Algorithm for Variable Selection Jennifer Pittman ISDS Duke University.](https://reader035.fdocuments.net/reader035/viewer/2022062307/551bf4cb550346a34f8b45d8/html5/thumbnails/8.jpg)
00010101 00111010 11110000
00010101 00111010 1111000000010001 00111011 1010010100100100 10111001 01111000
11000101 01011000 01101010
Initial Population
M = 12
L = 24
![Page 9: Genetic Algorithm for Variable Selection Jennifer Pittman ISDS Duke University.](https://reader035.fdocuments.net/reader035/viewer/2022062307/551bf4cb550346a34f8b45d8/html5/thumbnails/9.jpg)
Searching
• search space defined by all possible encodings of
solutions
• selection, crossover, and mutation perform
‘pseudo-random’ walk through search space
• operations are non-deterministic yet directed
![Page 10: Genetic Algorithm for Variable Selection Jennifer Pittman ISDS Duke University.](https://reader035.fdocuments.net/reader035/viewer/2022062307/551bf4cb550346a34f8b45d8/html5/thumbnails/10.jpg)
Phenotype Distribution
http://www.ifs.tuwien.ac.at/~aschatt/info/ga/genetic.html
![Page 11: Genetic Algorithm for Variable Selection Jennifer Pittman ISDS Duke University.](https://reader035.fdocuments.net/reader035/viewer/2022062307/551bf4cb550346a34f8b45d8/html5/thumbnails/11.jpg)
Evaluation and Selection
• evaluate fitness of each solution in current
population (e.g., ability to classify/discriminate)
[involves genotype-phenotype decoding] • selection of individuals for survival based on
probabilistic function of fitness
• may include elitist step to ensure survival of
fittest individual
• on average mean fitness of individuals increases
![Page 12: Genetic Algorithm for Variable Selection Jennifer Pittman ISDS Duke University.](https://reader035.fdocuments.net/reader035/viewer/2022062307/551bf4cb550346a34f8b45d8/html5/thumbnails/12.jpg)
Roulette Wheel Selection
©http://www.softchitech.com/ec_intro_html
![Page 13: Genetic Algorithm for Variable Selection Jennifer Pittman ISDS Duke University.](https://reader035.fdocuments.net/reader035/viewer/2022062307/551bf4cb550346a34f8b45d8/html5/thumbnails/13.jpg)
Crossover
• combine two individuals to create new individuals
for possible inclusion in next generation • main operator for local search (looking close to
existing solutions)
• perform each crossover with probability pc {0.5,…,0.8} • crossover points selected at random • individuals not crossed carried over in population
![Page 14: Genetic Algorithm for Variable Selection Jennifer Pittman ISDS Duke University.](https://reader035.fdocuments.net/reader035/viewer/2022062307/551bf4cb550346a34f8b45d8/html5/thumbnails/14.jpg)
Initial Strings Offspring
Single-Point
Two-Point
Uniform
11000101 01011000 01101010
00100100 10111001 01111000
11000101 01011000 01101010
11000101 01011000 01101010
00100100 10111001 01111000
00100100 10111001 01111000
10100100 10011001 01101000
00100100 10011000 01111000
00100100 10111000 01101010
11000101 01011001 01111000
11000101 01111001 01101010
01000101 01111000 01111010
![Page 15: Genetic Algorithm for Variable Selection Jennifer Pittman ISDS Duke University.](https://reader035.fdocuments.net/reader035/viewer/2022062307/551bf4cb550346a34f8b45d8/html5/thumbnails/15.jpg)
Mutation
• each component of every individual is modified with
probability pm
• main operator for global search (looking at new
areas of the search space)
• individuals not mutated carried over in population
• pm usually small {0.001,…,0.01}
rule of thumb = 1/no. of bits in chromosome
![Page 16: Genetic Algorithm for Variable Selection Jennifer Pittman ISDS Duke University.](https://reader035.fdocuments.net/reader035/viewer/2022062307/551bf4cb550346a34f8b45d8/html5/thumbnails/16.jpg)
©http://www.softchitech.com/ec_intro_html
![Page 17: Genetic Algorithm for Variable Selection Jennifer Pittman ISDS Duke University.](https://reader035.fdocuments.net/reader035/viewer/2022062307/551bf4cb550346a34f8b45d8/html5/thumbnails/17.jpg)
00010101 00111010 11110000
00010001 00111011 10100101
00100100 10111001 01111000
11000101 01011000 01101010
3021 3058 3240
3017 3059 3165
3036 3185 3120
3197 3088 3106
0.67
0.23
0.45
0.94
phenotype genotype fitness
1
4 2
3 1
3
4
4
00010101 00111010 11110000
00100100 10111001 01111000
11000101 01011000 01101010
11000101 01011000 01101010
selection
![Page 18: Genetic Algorithm for Variable Selection Jennifer Pittman ISDS Duke University.](https://reader035.fdocuments.net/reader035/viewer/2022062307/551bf4cb550346a34f8b45d8/html5/thumbnails/18.jpg)
00010101 00111010 11110000
00100100 10111001 01111000
11000101 01011000 01101010
11000101 01011000 01101010
one-point crossover (p=0.6)
0.3
0.8
00010101 00111001 01111000
00100100 10111010 11110000
11000101 01011000 01101010
11000101 01011000 01101010mutation (p=0.05)
00010101 00111001 01111000
00100100 10111010 11110000
11000101 01011000 01101010
11000101 01011000 01101010
00010101 00110001 01111010
10100110 10111000 11110000
11000101 01111000 01101010
11010101 01011000 00101010
![Page 19: Genetic Algorithm for Variable Selection Jennifer Pittman ISDS Duke University.](https://reader035.fdocuments.net/reader035/viewer/2022062307/551bf4cb550346a34f8b45d8/html5/thumbnails/19.jpg)
3021 3058 3240
3017 3059 3165
3036 3185 3120
3197 3088 3106
00010101 00111010 11110000
00010001 00111011 10100101
00100100 10111001 01111000
11000101 01011000 01101010
0.67
0.23
0.45
0.94
00010101 00110001 01111010
10100110 10111000 11110000
11000101 01111000 01101010
11010101 01011000 00101010
starting generation
next generation
phenotypegenotype fitness
3021 3049 3122
3166 3184 3240
3197 3120 3106
3213 3088 3042
0.81
0.77
0.42
0.98
![Page 20: Genetic Algorithm for Variable Selection Jennifer Pittman ISDS Duke University.](https://reader035.fdocuments.net/reader035/viewer/2022062307/551bf4cb550346a34f8b45d8/html5/thumbnails/20.jpg)
0 20 40 60 80 100 120
10
50
100GA Evolution
Generations
Acc
ura
cy in
Perc
en
t
http://www.sdsc.edu/skidl/projects/bio-SKIDL/
![Page 21: Genetic Algorithm for Variable Selection Jennifer Pittman ISDS Duke University.](https://reader035.fdocuments.net/reader035/viewer/2022062307/551bf4cb550346a34f8b45d8/html5/thumbnails/21.jpg)
genetic algorithm learning
http://www.demon.co.uk/apl385/apl96/skom.htm
0 50 100 150 200
-70
-
60
-
50
-4
0
Generations
Fitn
ess
cri
teri
a
![Page 22: Genetic Algorithm for Variable Selection Jennifer Pittman ISDS Duke University.](https://reader035.fdocuments.net/reader035/viewer/2022062307/551bf4cb550346a34f8b45d8/html5/thumbnails/22.jpg)
Fitn
ess
valu
e (
scale
d)
iteration
![Page 23: Genetic Algorithm for Variable Selection Jennifer Pittman ISDS Duke University.](https://reader035.fdocuments.net/reader035/viewer/2022062307/551bf4cb550346a34f8b45d8/html5/thumbnails/23.jpg)
• Holland, J. (1992), Adaptation in natural and artificial systems , 2nd Ed. Cambridge: MIT Press. • Davis, L. (Ed.) (1991), Handbook of genetic algorithms. New York: Van Nostrand Reinhold.
• Goldberg, D. (1989), Genetic algorithms in search, optimization and machine learning. Addison-Wesley.
References
• Fogel, D. (1995), Evolutionary computation: Towards a new philosophy of machine intelligence. Piscataway: IEEE Press.
• Bäck, T., Hammel, U., and Schwefel, H. (1997), ‘Evolutionary computation: Comments on the history and the current state’, IEEE Trans. On Evol. Comp. 1, (1)
![Page 24: Genetic Algorithm for Variable Selection Jennifer Pittman ISDS Duke University.](https://reader035.fdocuments.net/reader035/viewer/2022062307/551bf4cb550346a34f8b45d8/html5/thumbnails/24.jpg)
• http://www.spectroscopynow.com
• http://www.cs.bris.ac.uk/~colin/evollect1/evollect0/index.htm
• IlliGAL (http://www-illigal.ge.uiuc.edu/index.php3)
Online Resources
• GAlib (http://lancet.mit.edu/ga/)
![Page 25: Genetic Algorithm for Variable Selection Jennifer Pittman ISDS Duke University.](https://reader035.fdocuments.net/reader035/viewer/2022062307/551bf4cb550346a34f8b45d8/html5/thumbnails/25.jpg)
iteration
Perc
en
t im
pro
vem
en
t over
hill
clim
ber
![Page 26: Genetic Algorithm for Variable Selection Jennifer Pittman ISDS Duke University.](https://reader035.fdocuments.net/reader035/viewer/2022062307/551bf4cb550346a34f8b45d8/html5/thumbnails/26.jpg)
Schema and GAs• a schema is template representing set of bit strings
1**100*1 { 10010011, 11010001, 10110001, 11110011, … }• every schema s has an estimated average fitness f(s):
Et+1 k [f(s)/f(pop)] Et
• schema s receives exponentially increasing or decreasing
numbers depending upon ratio f(s)/f(pop)• above average schemas tend to spread through
population while below average schema disappear
(simultaneously for all schema – ‘implicit parallelism’)
![Page 27: Genetic Algorithm for Variable Selection Jennifer Pittman ISDS Duke University.](https://reader035.fdocuments.net/reader035/viewer/2022062307/551bf4cb550346a34f8b45d8/html5/thumbnails/27.jpg)
MALDI-TOF
©www.protagen.de/pics/main/maldi2.html