Soft Computing Multiobjective Optimization Richard P. Simpson.

Soft Computing

Multiobjective Optimization

Richard P. Simpson

A Landscape of interestInverted Shekel’s Foxholes

Recall the paper we discussed on Landscape Smoothing and its complex Objective function Calculate float_error

Accumulates error of the 10 sums while in float state. Calculate diff_total

Sort chromosome values and calculate the sum of the squares of the differences between the sequence 1,2,… 16 and the sorted chromosome

Calculate int_error Accumulates error of the 10 sums while in integer

state Minimize diff_total + float_error+ 2*int_error

Multiple Objective functions

Multiple Objective functions

Minimize diff_total + float_error+ 2*int_error

The above formula is really just a weighted sum of three different objective functions.

diff_total, float_error, and int_error

The method used above is really just one of the many methods used. A method that I might add has some problems.

Multi-Objective Evolutionary Algorithms (MOEA’s)

MOEA’s allow us to search for solutions to complex high

dimensional real-world applications that have multi-objective goals.

find solutions to problems using little problem domain knowledge

search in parallel easily find several trade-off solutions in a single run

of the algorithm( assuming niching is used) attack certain single objective problems from a

different perspective (Landscape smoothing)

So what is the general problem?

The Multiobjective optimization problem (MOP) can be defined as the problem of finding [Osyczka 1985] a vector of decision variables which satisfies constraints and optimizes a vector function show elements represent the objective functions.

Hence, the term “optimize” means finding such a solution which would give the values of all objective functions acceptable to the designer.

Formally

Find the vector X=[x1,x2,…,xn] that satisfies the m constraints gi(X)≥0 for all I and p equality constraints hi(X) =0 for all I and that optimizes the vector function f(X) = [f1(X), f2(X) , …, fk(X)]

Since we rarely have an X that minimizes/ maximizes all the fi at the same time the meaning of optimum is not well defined.

What is optimum if often problem dependent. Lets first look at some previous research in this area.

What does optimum mean here?

Having several objective functions implies that we are trying to find a good compromise rather than a single optimal solution.

Francis Ysidro Edgeworth first proposed a meaning for “optimum” in 1881 which was generalized in 1896 by Vilfredo Pareto

Pareto optimality

Pareto optimality optimality criterion for optimization problems

with multi-criteria objectives (multi-criteria optimization). A state (a set of object parameters) is said to be Pareto optimal, if there is no other state dominating the state with respect to a set of objective functions. A state X dominates a state Y , if X is better than Y in at least one objective function and not worse with respect to all other objective functions.

Pareto Optimality

Another way of saying this is: X is Pareto optimal if there exists no feasible vector X’ which would decrease some criterion without causing a simultaneous increase in at least one other criterion.

This concept almost always gives not a single solution, but rather a set of solutions called the Pareto optimal set (aka Pareto front)

Current Multi-objective Optimization(from Carlos A. Coello Coello)

There are over 30 mathematical programming techniques for multi-objective optimization.

These methods tend to generate elements of the Pareto front one at a time.

Most are sensitive to the shape of the Pareto front ( may not work if the front is concave or disconnected)

First implementation of an evolutionary was by Schaffer in 1984

After that the field was practically inactive until around 1995 it took off.

Popularity of Evolutionary algorithms in multi-objective optimization

Citations by Year (up to 2001)(from Carlos A. Coello Coello)

Classifying EMOO approaches(Evolutionary Multi-Objective Optimization)

First Generation Techniques Non-Pareto approaches Pareto approaches

Second Generation Techniques PAES SPEA NSGA-II MOMGA micro-GA

Non-Pareto Techniques

These are methods that do not use information about Pareto fronts explicitly.

Incapable of producing certain portions of the Pareto front.

Efficient and easy to implement, but appropriate to handle only a few objectives.

Aggregate Objective Model(weighted sum method)

Aggregated fitness functions are basically just a weighted sum of the objective functions. This is what we did in Landscape smoothing.

The weighted sum creates a single objective function from the multi-objective fitness function.

Determining the weights to use in this sum is non trivial and is almost always problem dependent.

Aggregate Function

The weighted sum is basically in the following form.

where represents the weights

k

iii xfw

1

)(min

0iw

k

iiw

1

1often we assume

Applications

Design of DSP system (Arslan, 1996) Water quality control (Garrett, 1999) System-level synthesis (Blickle, 1996) Design of optimal filters for lamps (Eklund,

1999) Landscape Smoothing (Simpson, 2004)

Vector Evaluated Genetic Algorithm (VEGA)

This work was performed by J. D. Schaffer in 1985 and can be found in paper Schaffer, J.D., Multiple objective optimization

with vector evaluated genetic algorithms. In this method appropriate fractions of the

next generation, or subpopulations, were selected from the whole of the old generation according to each of the objectives, separately.

Crossover and mutation were applied as usual after combining the sub-populations

VEGA

generation(i)

fill each section using a separate objection function

generation(i+1)

shuffle applygenericoperators

Advantages and Disadvantages

Efficient and easy to implement It does not have an explicit mechanism to

maintain diversity. It doesn’t necessarily produce non-dominated

vectors.

Sample Application of VEGA

Combinational circuit design at the gate-level (Coello,2000)

Design multiplierless IIR filters (Wilson, 1993) Aerodynamic optimization (Rogers, 2000) Groundwater pollution containment (Ritzel,

1994

Lexicographic Ordering

Here the user is asked to rand the objectives in order of importance.

The optimal solution is then obtained by minimized the objective functions, starting with the most important one and proceeding according to the assigned order

Sample applications

Symbolic layout compaction(Fourman, 1985) Robot path planning (Gacogne, 1999) Personel scheduling (El Moudani et al., 2001)

Target Vector Approaches

Definition of a set of goals (or targets) that we wish to achieve for each objective function.

The EA is set up to minimize differences between the current solution and these goals.

Can also be considered aggregating approaches, but in the case, concave portions of the Pareto front could be obtained.


Efficient and easy to implement Definition of goals may be difficult in some

cases Some methods have been known to introduce

misleading selection pressure under certain circumstances.

Goals must lie in the feasible region so that the solutions generated are members of the Pareto optimal set.

Sample Applications

Intensities of emission lines of trace elements (Wienke, 1992)

Optimization of a fishery bio-economic model ( Mardle et al., 2000)

Optimization of the counterweight balancing of a robot arm (Coello, 1998)

Pareto-based Techniques

Suggested by Goldberg (1989) to solve the problems with Schaffer’s VEGA.

Use of non-dominated ranking and selection to move the population towards the Pareto front

Requires a ranking procedure and a technique to maintain diversity in the population (Otherwise, that GA will tend to converge to a sing solution)

Multi-Objective Genetic Algorithm (MOGA)

Proposed by Fonseca and Fleming (1993) see “Genetic Algorithms for Multiobjective

Optimization:Formulation, Discussion and Generalization”

This approach consists of a scheme in which the rank of an individual corresponds to the number of individuals in the current population by which it is dominated.

It uses fitness sharing and mating restrictions.

MOGA Ranking

A vector X=(u1,u2,…,un) is superior (dominates) another vector Y =(v1,v2,…,vn) if for every i=1,…,n ui<=vi there exists i=1,…,n such that ui<vi

If X is superior to Y then Y is inferior to X. Let x be an individual in the population t

then rank(x,t)=1 + p(x) where p(x) is the number of individuals in population t that it is inferior to. Note that if it is a Pareto point then it is inferior to no one hence its rank is 1.

MOGA Ranking

Assigning fitness according to rank Sort population according to rank. Note that some rank

values may not be represented. Assign fitnesses to individuals by interpolation from the

best (rank 1) to the worst in the usual way, according to some function, usually linear.

Average the fitnesses of individuals with the same rank, so that all of them will be sampled at the same rate. Note that this procedure keeps the global population fitness constant while maintaining appropriate selective pressure, as defined by the function used.

Ranking example

Suppose that we have 10 individuals in population that have ranks of 1, 2, 3,1,1,2 ,5, 3, 2, 5

Since there are fitnesses of 1,2,3,and 5 we could create a roulette wheel obtaining the following fitness for each rank.

Sort them obtaining 1, 1, 1, 2, 2, 2, 3, 3, 5, 5 Map these guys to it fitness via function, say,

f(x)=6-x giving 5,5,5,4,4,4,3,3,1,1 for fitnesses The pie is then broken into 35 slices, the first three

getting 5 slices, the next three getting 4 etc.


Efficient and relative easy to implement Its performance depends on the appropriate

selection of the sharing factor. MOGA was the most popular first-generation

MOEA and it normally outperformed all of its contemporary competitors.

MOGA Applications

Fault diagnosis ( Marcu, 1997) Control systems design (Chipperfield 1995) Design of antennas (Thompson, 2001) System-level synthesis (Dick, 1998)

Niched-Pareto Genetic Algorithm (NPGA)

Proposed by Horn et al. (1993,1994) It uses a tournament selection scheme based

on Pareto dominance. Two individuals randomly chosen are compared against a subset of the entire population(10% or so). When both competitors are either dominated or non-dominated(ie a tie), the result of the tournament is decided through fitness sharing in the objective domain.


Easy to implement Efficient because does not apply Pareto

ranking to the entire pop. It seems to have a good overal performance. Besides requiring a sharing factor, it requires

another parameter (tournament size)

Sample applications

Analysis of experimental spectra (Golovkin, 2000)

Feature selection (Emmanouilidis, 2000) Fault-tolerant systems design (Schott, 1995) Road systems design ( Haastrup and Pereira,

1997)

Non-dominated Sorting Genetic Algorithm

Proposed by Srinivas and Deb (1994) Uses classifications layers.

layer 1 is the set of non-dominated individuals layer 2 is the set of non-dominated individuals that

occur when layer 1 is removed. etc. Sharing is performed at each layer using dummy

fitnesses for that layer. Sharing spreads out the search over each

classification layer. High fitness of the upper levels implies that the

Pareto front is heavily searched.

Research Questions at this time were:

Are aggregating functions really doomed to fail when the Pareto front is non-convex?

Can we find ways to maintain diversity in the pop. without using niches, which requires O(M2) work where M refers to the pop. size?

If assume that there is no way to reduce the O(kM2) complexity required to perform Pareto ranking, How can we design a more efficient MOEA.

Do we have appropriate test functions and metrics to evaluate quantitatively an MOEA?

Will somebody develop theoretical foundations for MOEA’s?

from Carlos, Coello Coello

Generation 2 (Elitism)

A new generation of algorithms came about with the introduction of the notion of elitism.

Elitism (in this context) refers to the use of an external pop to retain the non-dominated individual. Design issues include How does the external file interact with the

main population? What do we do when the external file is full Do we impose additional criteria to enter the

file instead of just using Pareto dominance?

from Carlos, Coello Coello

Second Generations Algorithms include

Strength Pareto Evolutionary Algorithm(SPEA), Zitzler and Thiele(1999)

Strength Pareto Evolutionary Algorithm 2 (SPEA 2) by Zitzler Laumanns and Thiele 2001

Pareto Archived Evolution Strategy(PAES) by Knowles and Corne(2000)

Nondominated Sorting Genetic Algorithm II Deb et al.(2002)

Niched Pareto Genetic Algorithm 2(NPGA 2), Erickson et al.(2001)

A quick look at the Pareto Archived Evolution Strategy (PAES)

(1+1) PAES is made up of 3 parts. The candidate solution generator

this is basically simple random mutation hillclimbing

it maintains a single current solution at each iteration productes a single new

candidate via random mutation the candidate solution acceptance function the Nondominated-Solutions (NDS) archive

PAES(1+1) Pseudocode

Generate initial random solution c and add it to the archive

Mutate c to produce m and evaluate mif (c dominates m) discard m

else if (m dominates c)replace c with m, and add m to the archive

else if (m is dominated by any member of the archive) discard m

else apply test(c, m, archive) to determine which becomes the new current solution and whether to add m to the archive

until a termination criterion has been reached, return to line 2

Test(c, m, archive)

if the archive is not fulladd m to the archiveif (m is in a less crowded region of the archive than c) accept m as the new current solution

else maintain c as the current solutionelse

if (m is in a less crowded region of the archive than x for some member x on the archive)

add m to the archive, and remove a member of the archive from the most crowded regionif (m is in a less crowded region of the archive than c)

accept m as the new courrent solutionelse maintain c as the current solution

The Adaptive grid

PAES uses a new crowding procedure based on recursively dividing up the d-dimensional objective space. This is done to minimize cost and to avoid niche-size parameter setting.

Phenotype space is divided into hypercubes, which have a width of dr/2k in each dimension, where dr is the range (maximum minus minimum) of values in objective d of the solutions currently in the archive, and k is the subdivision parameter.

Example grid for d=2 objectives

If we use 5 levels with 2 objectives we basically have a quad-tree structure.

Each level has 4 times the number of cells the previous level has. 1,4, 16, 64, 256, 1024

Hence we have 1024 regions of size (max-min)/25

For the simple case of k=3 the indicated cell has grid-location 101-100 or in binary 101100

0

0

1

1

Grid cell

So how do we find the grid location of X

Recursively (for each dimension) go down the tree left (0) or right(1) creating a binary number. This requires k comparisons

Then concat the binary strings creating a single binary number

Note that the grid location of the previous 1024 cells is just a 10 bit string.

Converting this 10 bit string to an integer gives one an index into a array Count[1024] that can be used to store the crowding number.

Soft Computing Multiobjective Optimization Richard P. Simpson.

Documents

Transcript of Soft Computing Multiobjective Optimization Richard P. Simpson.