Dr Sanchez - Slides - 6.PDF

14
1 University of Maryland 1 Center for Advanced Life Cycle Engineering INTRODUCTION TO MACHINE LEARNING – VI [email protected] Dr. Gustavo Sánchez University of Maryland 2 Center for Advanced Life Cycle Engineering ⦿Exploring basic concepts about Evolutionary Computing Goal for today University of Maryland 3 Center for Advanced Life Cycle Engineering Evolutionary Computing ⦿“This book offers a thorough introduction to Evolutionary Computing (EC), including the basics of all traditional variants” University of Maryland 4 Center for Advanced Life Cycle Engineering Evolutionary Computing ⦿ In 1948 , Alan Turing proposed to use an algorithm that today would be called “evolutionary” Turing, A. (1948) “Intelligent Machinery”, in Collected Works of A.M. Turing: Mechanical Intelligence, Elsevier Science, 1992

description

Slides

Transcript of Dr Sanchez - Slides - 6.PDF

Page 1: Dr Sanchez - Slides - 6.PDF

1

University of Maryland1Center for Advanced Life Cycle Engineering

INTRODUCTION TO

MACHINE LEARNING – VI

[email protected]

Dr. Gustavo Sánchez

University of Maryland2Center for Advanced Life Cycle Engineering

⦿Exploring basic concepts about

Evolutionary Computing

Goal for today

University of Maryland3Center for Advanced Life Cycle Engineering

Evolutionary Computing

⦿“This book offers a

thorough

introduction to

Evolutionary

Computing (EC),

including the basics

of all traditional

variants”University of Maryland4Center for Advanced Life Cycle Engineering

Evolutionary Computing

⦿ In 1948, Alan

Turing proposed

to use an

algorithm that

today would be

called

“evolutionary”Turing, A. (1948) “Intelligent Machinery”, in

Collected Works of A.M. Turing:

Mechanical Intelligence, Elsevier Science, 1992

Page 2: Dr Sanchez - Slides - 6.PDF

2

University of Maryland5Center for Advanced Life Cycle Engineering

Evolutionary Computing

⦿Evolutionary Computing (EC) is a research

area within Computer Science which draws

inspiration from the process of natural

evolution to solve computing problems

University of Maryland6Center for Advanced Life Cycle Engineering

What is Evolution?

⦿ A population of individuals

exists in an environment

with limited resources

⦿ Competition causes

selection of fit individuals

as seeds (Parents) for the

next generation through

Recombination and

MutationEvolution naive model

University of Maryland7Center for Advanced Life Cycle Engineering

What is Evolution?

⦿ The new individuals

(Offspring) compete

again for survival

⦿Over time this natural

selection causes a rise

in the fitness of the

population

University of Maryland8Center for Advanced Life Cycle Engineering

Genotype and Phenotype

⦿ The information required to build a living organism

is coded in its Genotype (DNA)

⦿ Phenotype are those features, physical and/or

behavioral, that determine its fitness

⦿Genotype ⇒ Phenotype is a very complex mapping

Page 3: Dr Sanchez - Slides - 6.PDF

3

University of Maryland9Center for Advanced Life Cycle Engineering

⦿ Each individual represents a unique combination of

phenotypic traits that will be evaluated by the

environment

⦿ If it evaluates favorably, its genes are propagated via

offspring, otherwise they are discarded

Genotype and Phenotype

University of Maryland10Center for Advanced Life Cycle Engineering

Recombination

Parent 1 Parent 2

Offspring

University of Maryland11Center for Advanced Life Cycle Engineering

Mutation

⦿ Darwin's insight was that

random mutations occur

during recombination

⦿ These mutations can be:

⦿ Catastrophic: offspring is

not viable (most likely)

⦿ Neutral: new feature

does not influence

fitness

⦿ Advantageous: useful

new feature occurs England, 1809 - 1882

University of Maryland12Center for Advanced Life Cycle Engineering

The voyage of the Beagle, 1831–1836

Ecuador

Venezuela

Page 4: Dr Sanchez - Slides - 6.PDF

4

University of Maryland13Center for Advanced Life Cycle Engineering University of Maryland14Center for Advanced Life Cycle Engineering

Recombination/Mutation and Selection

Recombination and mutation

create diversity and thereby

facilitate novelty

Selection reduces diversity and acts as a force pushing

adaptation

University of Maryland15Center for Advanced Life Cycle Engineering

Fitness Function

⦿Represents the requirements that the

population should adapt to: objective function

⦿A single fitness value is assigned to each

phenotype, which will be the basis for

selection

⦿The more discrimination (different values) the

better

University of Maryland16Center for Advanced Life Cycle Engineering

Population

⦿Set of possible solutions (genotypes)

⦿Usually has a fixed size

⦿Some sophisticated algorithms assert a spatial

structure on the population e.g., a grid.

⦿Selection operators usually take whole

population into account

Page 5: Dr Sanchez - Slides - 6.PDF

5

University of Maryland17Center for Advanced Life Cycle Engineering

Parent Selection

⦿ Selection probabilities are assigned to individuals

depending on their fitness

⦿ High fitness individuals are more likely to become

parents than low fitness individuals

⦿ However, even the worst individual usually has

non-zero probability of becoming a parent

University of Maryland18Center for Advanced Life Cycle Engineering

Survivor Selection

⦿Often deterministic

⦿ Fitness based : e.g., rank parents + offspring and

take best

⦿ Age based: make as many offspring as parents and

delete all parents

University of Maryland19Center for Advanced Life Cycle Engineering

Initialization / Termination

⦿ Initialization: usually random. Need to ensure even

spread. Can include previous solutions, or use

problem-specific heuristics

⦿ Termination condition is checked every generation

⦿ Reaching some (known/hoped for) fitness

⦿ Reaching some number of generations

⦿ Reaching some minimum level of diversity

⦿ Reaching some specified number of generations without fitness

improvement

University of Maryland20Center for Advanced Life Cycle Engineering

Landscape Metaphor

⦿ Population with n traits exists in a n+1-dimensional

space (landscape) with height corresponding to

fitness

⦿ Each different individual represents a single point

on this landscape

⦿ Population is therefore a cloud of points, moving on

the landscape over time as it evolves

Page 6: Dr Sanchez - Slides - 6.PDF

6

University of Maryland21Center for Advanced Life Cycle Engineering

Landscape Metaphor

University of Maryland22Center for Advanced Life Cycle Engineering

Landscape Metaphor

University of Maryland23Center for Advanced Life Cycle Engineering

Landscape Metaphor

University of Maryland24Center for Advanced Life Cycle Engineering

Landscape Metaphor

Early phase:

random population distribution

Mid-phase:

population arranged around/on hills

Late phase:

population concentrated on high hills

Page 7: Dr Sanchez - Slides - 6.PDF

7

University of Maryland25Center for Advanced Life Cycle Engineering

Typical RunB

est

fit

ne

ss in

po

pu

lati

on

Number of generations

Progress in 1st half

Progress in 2nd half

University of Maryland26Center for Advanced Life Cycle Engineering

Evolution and Optimization

EVOLUTION

Environment

Individual

Fitness Function

OPTIMIZATION

Problem

Decision Vector

Objective Function

University of Maryland27Center for Advanced Life Cycle Engineering

Evolutionary vs Mathematical

Optimization

University of Maryland28Center for Advanced Life Cycle Engineering

� Exploration: Discovering promising areas in

the search space, i.e. gaining information

on the problem

� Exploitation: Optimizing within a promising

area, i.e. using previous information

Exploration vs Explotation

Page 8: Dr Sanchez - Slides - 6.PDF

8

University of Maryland29Center for Advanced Life Cycle Engineering

Exploration

We want to find

this treasure

Our budget is

limited

University of Maryland30Center for Advanced Life Cycle Engineering

Explotation

We will not find the

treasure!

University of Maryland31Center for Advanced Life Cycle Engineering

Exploration vs Explotation

University of Maryland32Center for Advanced Life Cycle Engineering

Genetic Algorithms

� Developed: USA in the 1970’s (Holland)

� Typically applied to:

– discrete optimization

� Attributed features:

– slow convergence

– good for combinatorial problems

– emphasizes combining information from good parents (crossover)

– many variants, e.g., reproduction models, operators

Page 9: Dr Sanchez - Slides - 6.PDF

9

University of Maryland33Center for Advanced Life Cycle Engineering

Simple Genetic Algorithm (SGA)

� Holland’s original GA is now known as the

Simple Genetic Algorithm (SGA)

� Genotype representation: Binary Strings

University of Maryland34Center for Advanced Life Cycle Engineering

SGA operators: crossover

� Choose a random point on the two parents

� Split parents at this crossover point

� Create children by exchanging tails

University of Maryland35Center for Advanced Life Cycle Engineering

� Alter each gene independently with a probability pm

� pm is called the mutation rate

Typically between 1/pop_size and 1/ chromosome_length

SGA operators: mutation

University of Maryland36Center for Advanced Life Cycle Engineering

� Main idea: better individuals get higher chance

– Implementation: roulette wheel technique

� Assign to each individual a part of the roulette wheel

� Spin the wheel n times to select n individuals

fitness(A) = 3

fitness(B) = 1

fitness(C) = 2

A C

1/6 = 17%

3/6 = 50%

B

2/6 = 33%

SGA: parents selection

Page 10: Dr Sanchez - Slides - 6.PDF

10

University of Maryland37Center for Advanced Life Cycle Engineering

Representation Binary strings

Recombination 1-point

Mutation Bitwise bit-flipping with

fixed probability pm

Parent selection Fitness-Proportionate

Survivor selection All children replace

parents

Simple Genetic Algorithm (SGA)

University of Maryland38Center for Advanced Life Cycle Engineering

Application to Prognostics

University of Maryland39Center for Advanced Life Cycle Engineering

Application to Prognosis

The blue solid line represents real data and the red

dotted line represents the predicted dataUniversity of Maryland40Center for Advanced Life Cycle Engineering

Evolutionary Strategy

� Developed: Germany in the 1970’s

� Early names: Rechenberg, Schwefel

� Typically applied to:

– real-valued optimization

� Attributed features:

– acceptable results

– relatively much theory

– self-adaptation of mutation parameters

Page 11: Dr Sanchez - Slides - 6.PDF

11

University of Maryland41Center for Advanced Life Cycle Engineering

Evolutionary Strategy

� Basic algorithm: “two-individuals ES”

– Vectors from Rn

directly as chromosomes

– Population size 1

– Operator: only mutation creating one child

– Greedy selection

University of Maryland42Center for Advanced Life Cycle Engineering

Evolutionary Strategy

� t = 0

� Create initial point xt = ⟨ x1t,…,xn

t ⟩

� REPEAT UNTIL (TERMIN.COND satisfied)

� Draw zi from a normal distr. for all i = 1,…,n

� yt = xt + z

� IF f(xt) < f(yt) THEN xt+1 = xt

� ELSE xt+1 = yt

� Set t = t+1

University of Maryland43Center for Advanced Life Cycle Engineering

Evolutionary Strategy

� z values drawn from normal distribution N(0,σ2)

– σ is called mutation step size

� This rule resets σ after every k iterations by

– σ = σ / c if ps > 1/5

– σ = σ • c if ps < 1/5

where ps is the rate of successful mutations, 0.8 ≤ c ≤ 1

University of Maryland44Center for Advanced Life Cycle Engineering

Genetic Programming

� Developed: USA in the 1990’s (Koza)

� Typically applied to:

– machine learning tasks (prediction, classification…)

� Attributed features:

– competes with decision trees, neural nets, etc

– needs huge populations (thousands individuals)

– slow

Page 12: Dr Sanchez - Slides - 6.PDF

12

University of Maryland45Center for Advanced Life Cycle Engineering

Tree-based representation

(NOC = 2) AND (S > 80000)

AND

S2NOC 80000

>=

University of Maryland46Center for Advanced Life Cycle Engineering

Tree-based representation

University of Maryland47Center for Advanced Life Cycle Engineering

Tree-based representation

� Tree shaped chromosomes are complex structures

� Trees may vary in depth and width

University of Maryland48Center for Advanced Life Cycle Engineering

Tree-based representation

� Most common mutation: replace a randomly chosen

subtree by another randomly generated subtree

Page 13: Dr Sanchez - Slides - 6.PDF

13

University of Maryland49Center for Advanced Life Cycle Engineering

Tree-based representation

Child 2

Parent 1 Parent 2

Child 1 University of Maryland50Center for Advanced Life Cycle Engineering

Genetic Programming

Representation Tree structures

Recombination Exchange of subtrees

Mutation Random change in trees

Parent selection Fitness proportional

Survivor selection All children replace

parents

University of Maryland51Center for Advanced Life Cycle Engineering

Evolutionary Computing Variants

⦿Historically different variants have been

associated with individual representations

⦿ Integer strings : Genetic Algorithms

⦿ Real-valued vectors : Evolution Strategies

⦿ Trees: Genetic Programming

University of Maryland52Center for Advanced Life Cycle Engineering

Evolutionary Computing Variants

⦿These differences are today irrelevant

⦿The best practice is to choose a good

representation to suit the problem, and

variation operators to suit representation

⦿Selection operators are independent of

representation

Page 14: Dr Sanchez - Slides - 6.PDF

14

University of Maryland53Center for Advanced Life Cycle Engineering

Evolutionary Computing Demo

DEMO

University of Maryland54Center for Advanced Life Cycle Engineering

Machine Learning

Applications to PHM