Coevolutionary Automated Software Correction
description
Transcript of Coevolutionary Automated Software Correction
Coevolutionary Automated Software Correction Coevolutionary Automated Software Correction
Josh WilkersonJosh Wilkerson
PhD Candidate in Computer SciencePhD Candidate in Computer Science
Missouri S&TMissouri S&T
Page 2Technical Background
Evolutionary Algorithms (EAs)– Subfield of evolutionary computation (in artificial intelligence)
– Based on biological evolution
– Uses mutation, reproduction, and selection
– Population composed of candidate solutions
– Needed:
• Solution representation
• Fitness function
– Applicable to a wide variety of fields
– Makes no assumptions about the problem space (ideally)
Page 3Technical Background
EA Operation– Start with an initial population
– Each generation
• Create new individuals and evaluate them
• Population competition (survival of the fittest)
– Mutation and reproduction
• Explore the problem space
• Bring in new genetic material
– Selection
• Applies pressure to individuals
• More fit individuals are selected for mutation and reproduction more often
Page 4Technical Background
Genetic Programming– Type of EA
– Evolves tree representations
– E.g., computer program parse trees
Coevolution– Extension of standard EA
– Fitness dependency between individuals
– Dependency can be either cooperative or competitive
– CASC system uses competitive coevolution
– Evolutionary arms-race
Page 5High Level View of CASC
Page 6CASC Evolutionary Model
Page 7CASC Evolutionary Model
Page 8CASC Evolutionary Model
Page 9CASC Evolutionary Model
Page 10Reproduction Phase: Programs
Randomly select a genetic operation to perform
– Probability of operation selection is configurable
Perform operation, generate new program(s)
Add new individuals to population
Repeat until specified number of individuals has been created
Page 11Reproduction Phase: Programs
Genetic Operations
– Reset
– Copy
– Crossover
• Two individuals are randomly selected based off fitness
• Randomly select and exchange compatible sub-trees
• Generates two new programs
– Mutation
• Randomly select individual based off fitness
• Randomly select and change mutable node
• Generate a new sub-tree (if necessary)
– Architecture Altering Operations
Reselection is allowed for all operators
Page 12Reproduction Phase: Test Cases
Reproduction employs uniform crossover
Each offspring has a chance to mutate
Genes to mutate are selected random
Mutated gene is randomly adjusted
– The amount adjusted is selected from a Gaussian distribution
Page 13CASC Evolutionary Model
Page 14CASC Evolutionary Model
Page 15CASC Evolutionary Model
Page 16CASC Evolutionary Model
Page 17CASC Implementation Details
Adaptive parameter control
– EAs typically have many control parameters
– Difficult to find optimal settings for these parameters
– In CASC genetic operator probabilities are adaptive parameters
– Rewarded/punished based on performance
• If one operator is generating improved individuals more than the others make it more likely to be used
– Allows the system to adapt to the different phases in the search
Page 18CASC Implementation Details
Parallel Computation– Computational complexity is generally a problem for Eas
– CASC writes, compiles, and executes hundreds (or even thousands) of C++ programs in a given run
– To reduce run times this is done in parallel (on the NIC cluster here on campus)
– Main node: responsible for generating and writing programs
– Worker nodes: responsible for compiling and executing programs
– Dramatically speeds up execution
– Investigating new options for this (discussed later)
Page 19Current and Future Work
Fitness Function Design– For each new problem CASC needs a new fitness function
– Fitness function design can often be difficult
– Developing a guide for fitness function design
– Starts a program specifications
– Walks through the thought process for designing a fitness function for the problem
– Long term goal: automate fitness function creation
Page 20Current and Future Work
File system slow down– CASC is writing and compiling many many programs each run
– I.e., many many files in the file system each run
– File system access is bottlenecking the speed of the CASC system
– Currently reworking the system to store program files and executables in RAM
– Uses a virtually mounted hard disk that stored data in RAM
– Expecting a dramatic speed up (fingers crossed…)
– Other option: distributed computing (like BOINC, Folding@home, etc.)
Page 21Current and Future Work
Scalability– As program size increases so does the problem space
• Many more modifications possible
• More genetic material
– Investigating options to allow CASC to scale with problem size
– Current idea: break the program up into pieces
• Multiple program populations
• Each population is based on a piece of the original program
• Each population has its own objective
• Cooperative coevolution
Page 22Current and Future Work
Page 23
Questions?