Post on 30-Oct-2014
A
SEMINAR REPORT
ON
GENETIC ALGORITHM
Submitted to: Submitted by:
Er. Richa Dutta Neh yadav(4108030)
Lecturer Kamini (4108021)
CSE-8TH Sem
YAMUNA INSTITUTE OF ENGINEERING AND TECHNOLOGY
GADHOLI
1
Abstract
Genetic algorithms provide heuristic solutions for combinatorial-optimization problems
that have found applications in many areas with outstanding success. Genetic algorithms
is an optimization technique for searching very large spaces that models the role of the
genetic material in living organisms. A genetic algorithm (GA) is a search technique used
in computing to find exact or approximate solutions to optimization and search problems.
Genetic algorithms are categorized as global search heuristics. Genetic algorithms are a
part of evolutionary computing, which is a rapidly growing area of artificial intelligence.
It uses techniques inspired by evolutionary biology such as inheritance, mutation,
selection, and crossover. A small population of individual exemplars can effectively
search a large space because they contain schemata, useful substructures that can be
potentially combined to make fitter individuals. Formal studies of competing schemata
show that the best policy for replicating them is to increase them exponentially according
to their relative fitness. This turns out to be the policy used by genetic algorithms. Fitness
is determined by examining a large number of individual fitness cases. This process can
be very efficient if the fitness cases also evolve by their own GAs.
2
1.Introduction
1.1 A Biology Lesson
Every organism has a set of rules, a blueprint so to speak, describing how that organism
is built up from the tiny building blocks of life. These rules are encoded in the genes of
an organism, which in turn are connected together into long strings called chromosomes.
Each gene represents a specific trait of the organism, like eye colour or hair colour, and
has several different settings. For example, the settings for a hair colour gene may be
blonde, black or auburn. These genes and their settings are usually referred to as an
organism's genotype. The physical expression of the genotype - the organism itself - is
called the phenotype. When two organisms mate they share their genes. The resultant
offspring may end up having half the genes from one parent and half from the other. This
process is called recombination. Very occasionally a gene may be mutated. Normally this
mutated gene will not affect the development of the phenotype but very occasionally it
will be expressed in the organism as a completely new trait.
3
1.2 About Genetic Algorithms
Genetic Algorithms are adaptive heuristic search algorithm premised on the
evolutionary ideas of natural selection and genetic. The basic concept of Genetic
Algorithms is designed to simulate processes in natural system necessary for
evolution, specifically those that follow the principles first laid down by Charles
Darwin of survival of the fittest. As such they represent an intelligent exploitation
of a random search within a defined search space to solve a problem. First
pioneered by John Holland in the 60s, Genetic Algorithms has been widely
studied, experimented and applied in many fields in engineering worlds. Not only
does Genetic Algorithms provide an alternative methods to solving problem, it
consistently outperforms other traditional methods in most of the problems link.
Many of the real world problems involved finding optimal parameters, which
might prove difficult for traditional methods but ideal for Genetic Algorithms .
However, because of its outstanding performance in optimisation, Genetic
Algorithms have been wrongly regarded as a function optimiser. In fact, there are
many ways to view genetic algorithms. Perhaps most users come to Genetic
Algorithms looking for a problem solver, but this is a restrictive view.
4
1.3 Brief Overview
Genetic algorithms are inspired by Darwin's theory about evolution. Solution to a
problem solved by genetic algorithms is evolved.
Algorithm is started with a set of solutions (represented by chromosomes) called
population. Solutions from one population are taken and used to form a new
population. This is motivated by a hope, that the new population will be better
than the old one. Solutions which are selected to form new solutions (offspring)
are selected according to their fitness - the more suitable they are the more
chances they have to reproduce.
This is repeated until some condition (for example number of populations or
improvement of the best solution) is satisfied.
Example
Problem solving can be often expressed as looking for extreme of a function. This
is exactly what the problem shown here is. Some function is given and Genetic
Algorithms tries to find minimum of the function.
5
Fig. 1.1
Fig 1.1 Graph represents some search space and goal is to travel from the gray cell to the
green cell in the shortest number of steps .
6
2. Genetic Algorithm
1. [Start] Generate random population of n chromosomes (suitable solutions for the
problem)
2. [Fitness] Evaluate the fitness f(x) of each chromosome x in the population
3. [New population] Create a new population by repeating following steps until the
new population is complete .
[Selection] Select two parent chromosomes from a population according
to their fitness (the better fitness, the bigger chance to be selected) .
[Crossover ] In genetic algorithms, crossover is a genetic operator used to
vary the programming of a chromosome or chromosomes from one
generation to the next. It is analogous to reproduction and biological
crossover, upon which genetic algorithms are based. Cross over is a
process of taking more than one parent solutions and producing a child
solution from them. There are methods for selection of the chromosomes.
With a crossover probability cross over the parents to form a new
offspring (children). If no crossover was performed, offspring is an exact
copy of parents.
Fig 2.1
Figure 2.1 Shows the crossover between parent 1 and parent 2. As we can see, the
children take one section of the chromosome from each parent. The point at which the
7
chromosome is broken depends on the randomly selected crossover point. This particular
method is called single point crossover because only one crossover point exists.
[Mutation] In genetic algorithms of computing, mutation is a genetic operator
used to maintain genetic diversity from one generation of a population of
algorithm chromosomes to the next. It is analogous to biological mutation.
Mutation alters one or more gene values in a chromosome from its initial state. In
mutation, the solution may change entirely from the previous solution. Hence
Genetic Algorithms can come to better solution by using mutation. With a
mutation probability mutate new offspring at each locus (position in chromosome.
Fig 2.2
After selection and crossover, you now have a new population full of individuals. Some
are directly copied, and others are produced by crossover. In order to ensure that the
individuals are not all exactly the same, you allow for a small chance of mutation.
[Accepting] Place new offspring in a new population
4. [Replace] Use new generated population for a further run of algorithm
5. [Test] If the end condition is satisfied, stop, and return the best solution in current
population
6. [Loop] Go to step 2
3. Who can benefit from Genetic Algorithms
8
Nearly everyone can gain benefits from Genetic Algorithms, once he can encode
solutions of a given problem to chromosomes in Genetic Algorithms , and
compare the relative performance (fitness) of solutions.
An effective Genetic Algorithms representation and meaningful fitness
evaluation are the keys of the success in Genetic Algorithms applications.
The appeal of Genetic Algorithms comes from their simplicity and elegance as
robust search algorithms as well as from their power to discover good solutions
rapidly for difficult high-dimensional problems.
The search space is large, complex or poorly understood.
Domain knowledge is scarce or expert knowledge is difficult to encode to narrow
the search space.
No mathematical analysis is available.
Traditional search methods fail.
Genetic Algorithm have been used for problem-solving and for modeling .
Genetic Algorithms are applied to many scientific, engineering problems, in
business and entertainment, including traveling salesman problem.
9
4. Applications
4.1. Automotive Design
Using Genetic Algorithms to both design composite materials and aerodynamic
shapes for race cars and regular means of transportation (including aviation) can
return combinations of best materials and best engineering to provide faster,
lighter, more fuel efficient and safer vehicles for all the things we use vehicles for.
Rather than spending years in laboratories working with polymers, wind tunnels
and balsa wood shapes, the processes can be done much quicker and more
efficiently by computer modeling using Genetic Algorithms searches to return a
range of options human designers can then put together however they please.
10
4.2 Engineering Design
Getting the most out of a range of materials to optimize the structural and
operational design of buildings, factories, machines, etc. is a rapidly expanding
application of Genetic Algorithms. These are being created for such uses as
optimizing the design of heat exchangers, robot gripping arms, satellite booms,
building trusses, flywheels, turbines, and just about any other computer-assisted
engineering design application.
There is work to combine Genetic Algorithms optimizing particular aspects of
engineering problems to work together, and some of these can not only solve
design problems, but also project them forward to analyze weaknesses and
possible point failures in the future so these can be avoided.
11
4.3 Robotics
Robotics involves human designers and engineers trying out all sorts of things in
order to create useful machines that can do work for humans. Each robot's design
is dependent on the job or jobs it is intended to do, so there are many different
designs out there.
Genetic Algorithms can be programmed to search for a range of optimal designs
and components for each specific use, or to return results for entirely new types of
robots that can perform multiple tasks and have more general application.
Genetic Algorithm designed robotics just might get us those nifty multi-purpose,
learning robots we've been expecting any year now since we watched the Jetsons
as kids, who will cook our meals, do our laundry and even clean the bathroom for
us !
12
4.4 Optimized Telecommunications Routing
Do you find yourself frustrated by slow LAN performance, inconsistent internet
access, a FAX machine that only sends faxes sometimes, your land line's number
of 'ghost' phone calls every month? Well, Genetic Algorithms are being
developed that will allow for dynamic and anticipatory routing of circuits for
telecommunications networks.
These could take notice of your system's instability and anticipate your re-routing
needs. Using more than one Genetic Algorithms circuit-search at a time, soon
your interpersonal communications problems may really be all in your head rather
than in your telecommunications system.
Other Genetic Algorithms are being developed to optimize placement and
routing of cell towers for best coverage and ease of switching, so your cell phone
and blackberry will be thankful for Genetic Algorithms too.
13
4.5 Trip, Traffic and Shipment Routing
New applications of a Genetic Algorithms known as the "Traveling Salesman
Problem" or TSP can be used to plan the most efficient routes and scheduling for
travel planners, traffic routers and even shipping companies. The shortest routes
for traveling. The timing to avoid traffic tie-ups and rush hours.
Most efficient use of transport for shipping, even to including pickup loads and
deliveries along the way. The program can be modeling all this in the background
while the human agents do other things, improving productivity as well! Chances
are increasing steadily that when you get that trip plan packet from the travel
agency, a Genetic Algorithms contributed more to it than the agent did.
6.Cancer gene search with data-mining and genetic Algorithms
6.1 Introduction
Cancer leads to approximately 25% of all mortalities, making it the second
leading cause of death in the United States. Early and accurate detection of cancer
14
is critical to the well being of patients. Analysis of gene expression data leads to
cancer identification and classification which will facilitate proper treatment
selection and drug development. Gene expression data sets for ovarian, prostate,
and lung cancer were analyzed in this research. An integrated gene-search
algorithm for genetic expression data analysis was proposed. This integrated
algorithm involves a genetic algorithm and correlation-based heuristics for data
preprocessing (on partitioned data sets) and data mining (decision tree and
support vector machines algorithms) for making predictions. Knowledge derived
by the proposed algorithm has high classification accuracy with the ability to
identify the most significant genes.
Cancer develops mainly in epithelial cells, connecting/muscle tissue (sarcomas),
and white blood cells. A successive mutation in the normal cell that damages the
DNA and impairs the cell replication mechanism .There are number of
carcinogens such as tobacco smoke, radiation, certain microbes, synthetic
chemicals, polluted water, and air that may accelerate the mutations. Thus, there
is a need to identify the mutated genes that contribute to a cancerous state. One of
the methods for cancer identification is through the analysis of genetic data. The
human genome contains approximately10 million single nucleotide
polymorphisms. These Single nucleotide polymorphisms are responsible for the
variation that exists between human beings. Due to the high cost, genetic data
(containing as many as 15,000 genes per patient) is normally collected on a
limited number of patients (100–300 patients). There is a need to select the most
informative genes from such wide data sets . Removal of uninformative genes
decreases noise, confusion, and complexity, and increases the chances for
identification of the most important genes, classification of diseases, and
prediction of various outcomes, e.g., cancer type.
A genetic algorithm is a search algorithm based on the concept of natural
genetics. A genetic algorithm is initiated with a set of solutions (chromosomes)
called the population .Each solution in the population is evaluated based on its
fitness. Solutions chosen to form new chromosomes (offspring) are selected
according to the fitness, i.e., the more suitable the solution the higher the
15
likelihood it will reproduce. This is repeated until some condition (for example,
the number of populations or quality of the best solution) is satisfied. Genetic
algorithm searches the solution space without following crisp constraints and
takes into account potentially all feasible solution regions. This provides a chance
of searching previously unexplored regions, and there is a high possibility of
achieving an overall optimal/near optimal solution, making the genetic algorithm
a global search algorithm
6.2 Integrated algorithm
The integrated gene-search algorithm consists of two phases. The iterative Phase I
includes data partitioning, execution of the Decision Tree algorithm (or other
data-mining algorithms) to the partitioned data set, the genetic algorithm, and the
correlation-based heuristics for gene reduction. The set of significant genes is
utilized in Phase II for validation of the quality of genes. A data-mining (i.e.,
classification) algorithm takes a training expression data set as input and predict if
the test sample is a normal or cancerous. Thus, data-mining algorithms are applied
to the training and testing data sets and their results are evaluated to determine the
most significant gene set.
In Phase I, the cancer training gene data set is initially partitioned into several
subsets with approximately 1000 genes in each subset (Fig. 6.1). The partitioning
of the data sets can be performed arbitrarily or randomly. The Decision Tree
algorithm is applied to each partitioned data set to determine the classification
accuracy. The total number of genes selected (most significant as well as medium
significant genes) from all the partitioned data sets is an overestimate of the actual
significant gene The total number of genes selected from all the partitioned data
sets are merged to formulate a single gene set (Fig. 6.1). If the current gene set is
more than the user-defined threshold (e.g.,1000 genes), then the gene set is re-
partitioned to form the next iteration of data-mining and GA–CFS(Genetic
16
Algorithm-Correlation Based Feature Selection) algorithms. Phase I is repeated
until the number of significant genes is less than the threshold. To further reduce
the number of genes, the Genetic Algorithm-Correlation Based Feature
Selection)algorithm can be re-applied to the reduced gene data sets.
In Phase II, data-mining algorithms such as Decision Tree and Support Vector
Machine algorithms are then applied to the training dataset for only the significant
genes (Fig. 6.1). The classification accuracy obtained from this reduced gene data
set is not smaller than the maximum classification accuracy from the previous
partitioned data sets.This step validates the fact that the proposed gene selection
algorithm preserves the information/knowledge.
17
Data set Data set Data set
Phase I
YES
NO
Phase II
s
Fig.6.1 Integrated gene-search algorithm
18
Complete data set for cancer
00001 to01000
01001 to02000
0i001 to0i+1000
1n001 to1n+1000
Datamining
Datamining
Datamining
Datamining
GA-CFS GA-CFS GA-CFS GA-CFS
Identify gene set
If >1000
Data mining
Testing results Training results
Most significant genes
6.3 Conclusion
The integrated gene-search algorithm (Genetic Algorithm-Correlation Based Feature
Selection algorithm with data mining) was proposed and successfully applied to the
training and test genetic expression data sets of ovarian, prostate, and lung cancers. This
uniformly applicable algorithm not only provided high classification accuracy but also
determined a set of the most significant genes for each of the three cancers. These gene
sets require further investigation for their medical relevance, as the prediction power
attained from these gene sets is statistically equivalent to that reported in the literature.
The integrated gene-search algorithm is capable of identifying significant genes by
partitioning the data set with a correlation-based heuristic. The overestimate of the actual
significant gene set using this algorithm allows the investigation of potentially useful
genes or their combinations. This leads to multiple models and supports the underlying
hypothesis that genetic expression data sets can be used in diagnosis of various cancers.
19
5.Genetic Algorithm Problems
5.1 The Algorithm
A genetic algorithm can be thought of as a search. Given some initial state, the algorithm
is searching for an optimal state. It does this in a way that mimics nature (hence the
name). Say you have a population of a certain species. The first generation of these
creatures may not be optimally suited for their environment. Over time the individuals
who are less suited die off while those that are well suited reproduce and dominate the
others. In addition to reproduction between well suited individuals (cross-over in the
context of a Genetic Algorithm) the offspring of those individuals experience mutation,
meaning that the child of individuals A and B is not purely a cross between the two, but
has its own unique traits. Generally mutation occurs at a low probability.
In the context of programming, a Genetic Algorithm can be expressed as a function that
takes as input a population and a fitness function. The population is a collection of
individuals and the fitness function is a means of determining how fit an individual is. At
generation zero the population is usually randomly generated. In order to get from
generation zero to generation 1, the algorithm uses the fitness function to determine
which individuals to include in the cross-over (reproduce), leaving out the rest. The
children of those individuals are then passed to a mutate function, that alters them in
some way, usually at a very low probability. Here’s some pseudo code that might help to
understand how this might be implemented:
20
5.2 How They Are Used
There are a variety of problems that can be solved with genetic algorithms.
Genetic Algorithm are adept for optimization problems in particular. K-
SATISFIABILITY problems for example can be solved with a genetic algorithm
(though other means exist).
For anyone not familiar with K-SATISFIABILITY problems I’ll give a short
explanation. SATISFIABILITY (or satisfaction) problems attempt to assign values
to a boolean formula in such a way that it evaluates to true. So if my
SATISFIABILITY problem consists of two variables: A and B and one clause: A
OR B then one solution would be A = true, B = true.
Clauses are the components of the boolean formula, in the example I gave the
formula consists of only one clause. A larger SATISFIABILITY problem may
consist of hundreds of variables and thousands of clauses and cannot be solved on
paper in a reasonable amount of time. Here is an example of a larger sat problem:
(A OR B OR C) AND (A OR !B OR !C) AND (!A OR B OR !C)
This formula consists of three variables (A, B, C) and three clauses. A solution to
this problem would be A = true, B = true, C = false. Notice that there are many
different assignments of these variables that satisfy the formula. If there were
more clauses this might not be the case.
To solve a SATISFIABILITY problem with a genetic algorithm you start of with a
population of randomly generated “solutions”, each solution consisting of a
random assignment of true of false to each variable. This population is generation
zero. In this context the fitness function is defined as the number of satisfied (or
unsatisfied) clauses in the boolean formula.
21
Using the fitness function, for each individual in generation zero, a fitness value is
determined. It might be the case that one of these individuals satisfies the formula,
in which case you’re done. Otherwise, in order to get from generation zero to
generation one, we must choose a portion of the population to “reproduce”, for
example, those having a fitness above the average.
Once we’ve made our selection we perform the cross over by producing a new
individual with a portion of its assignments coming from each parent (the size of
the portion may be determined randomly). For example, for individuals X and Y
and X(A,B,C) = {True , False, False} and Y(A,B,C) = {False, True, True} a
possible child would be Child(A,B,C) = {False, False, False}.
After we’ve generated a new population we then randomly mutate each individual
at a very low probability. At probabilities above 5% in many cases a solution will
not be found in a reasonable amount of time. A mutation takes an assignment and
flips it. So for the individual X(A,B,C) = {True, False, False} if a mutation event
occurs on the variable B, it will become X(A,B,C) {True, True, False}. Without
this mutation the algorithm does not approach a solution.
At Generation zero for a large problem, there is very little chance of a solution
existing. After each passing generation, however, the average fitness increases
and it becomes likely that an individual satisfies the formula.
5.3 Problems with Genetic Algorithms
After each generation the individuals of a population begin to approach the
solution. In the context of a SATISFIABILITY problem this means they satisfy
more and more clauses. There is, however, no guarantee that they will ever satisfy
all of them. This is because individuals that have a fitness near the maximum,
may actually be very different from the solution.
For example, say a SATISFIABILITY problem has the solution 000011000 where
each character in the bit string represents a variable and the 0s represent false, and
22
the 1s represent true. The string 111100111 might satisfy 90% of the clauses. If
this is the case, the children produced by this individual will look similar to it and
the likelihood of it being mutated into the solution is essentially zero. The
following graph illustrates this problem:
Local Max Problem
From the graph you can see that there are two peaks, one reaching 100, the other
75. The higher one represents the solution to the problem, while the other is called
a local maximum. A genetic algorithm may reach the peak of a local maximum
and become stuck because all similar solutions have a lower fitness, while the
actual solution is un similar to the current state.
5.4 Possible Solutions
A possible way to fix this problem would be to reset the search. Generated a new
set of random solutions as the algorithm did at generation zero and proceed from
23
there. This is called a random-reset. Hopefully after the reset the search will
approach the solution rather than a local max.
Another similar solution would be to mutate each individual in the current
population at a much higher rate, possibly 100%. This would produce a
population that very different from the one that existed at the local maximum.
These solutions would fix the problem in a case where there were only a few local
maximums, but for some problems it might be the case that there are numerous
local maximums. For these problems, genetic algorithms with random-reset might
find solutions that have very high fitness, but never the solution.
24
6.CONCLUSION & FUTURE SCOPE
Genetic algorithm is a probabilistic solving optimization problem which is modeled on a
genetic evaluations process in biology and is focused as an effective algorithm to find a
global optimum solution for many types of problem. This algorithm is extremely
applicable in different artificial intelligence approaches as well as different basics
approaches like object oriented, robotics and other in future we shall concentrate on the
development of hybrid approaches using genetic algorithm an object oriented technology.
Genetic Algorithms are good at taking larger, potentially huge search spaces and
navigating them looking for optimal combinations of things and solutions which we
might never be able to find. The use of genetic algorithms to solve large and often
complex computational problems has given rise to many new applications in a variety of
disciplines. They have discovered powerful, high quality solutions to difficult practical
problems in a diverse variety of fields.
25
7. References
[1] http://lancet.mit.edu/~mbwall/presentations/IntroToGAs
[2]http://www.ai-junkie.com/ga/intro/gat1.html
[3]http://en.wikipedia.org/wiki/Genetic_algorithm
[4]http://css.engineering.uiowa.edu/~ankusiak/Journal-papers/Gene_07.pdf
[5]http://brainz.org/15-real-world-applications-genetic-algorithms
26