Coevolutionary Automated Software Correction

25
Coevolutionary Automated Software Coevolutionary Automated Software Correction Correction Josh Wilkerson Josh Wilkerson PhD Candidate in Computer PhD Candidate in Computer Science Science Missouri S&T Missouri S&T

description

Coevolutionary Automated Software Correction. Josh Wilkerson PhD Candidate in Computer Science Missouri S&T. High Level View of CASC. CASC Evolutionary Model. CASC Evolutionary Model. CASC Evolutionary Model. CASC Evolutionary Model. Reproduction Phase: Programs. - PowerPoint PPT Presentation

Transcript of Coevolutionary Automated Software Correction

Page 1: Coevolutionary Automated Software Correction

Coevolutionary Automated Software Correction Coevolutionary Automated Software Correction

Josh WilkersonJosh Wilkerson

PhD Candidate in Computer SciencePhD Candidate in Computer Science

Missouri S&TMissouri S&T

Page 2: Coevolutionary Automated Software Correction

Page 2High Level View of CASC

Page 3: Coevolutionary Automated Software Correction

Page 3CASC Evolutionary Model

Page 4: Coevolutionary Automated Software Correction

Page 4CASC Evolutionary Model

Page 5: Coevolutionary Automated Software Correction

Page 5CASC Evolutionary Model

Page 6: Coevolutionary Automated Software Correction

Page 6CASC Evolutionary Model

Page 7: Coevolutionary Automated Software Correction

Page 7Reproduction Phase: Programs

Randomly select a genetic operation to perform

– Probability of operation selection is configurable and/or adaptive

Select individual(s) to use

– First select sub-set of individuals (i.e., tournament)

– Then perform fitness proportional selection in sub-set (i.e., roulette)

– Reselection allowed

Perform operation, generate new program(s)

Add new individuals to population

Repeat until specified number of individuals has been created

Page 8: Coevolutionary Automated Software Correction

Page 8Reproduction Phase: Programs

Genetic Operations

– Reset

– Copy

– Crossover

• Two individuals are randomly selected based off fitness

• Randomly select and exchange compatible sub-trees

• Generates two new programs

– Mutation

• Off-by-one mutation bias

• Randomly select individual based off fitness

• Randomly select and change mutable node

• Generate a new sub-tree (if necessary)

– Architecture Altering Operations

• Delete a line, add assignment, add flow control

Page 9: Coevolutionary Automated Software Correction

Page 9Reproduction Phase: Test Cases

Reproduction employs uniform crossover

Same selection method as programs

Each offspring has a chance to mutate

Genes to mutate are selected random

Mutated gene is randomly adjusted

– The amount adjusted is selected from a Gaussian distribution

Page 10: Coevolutionary Automated Software Correction

Page 10CASC Evolutionary Model

Page 11: Coevolutionary Automated Software Correction

Page 11CASC Evolutionary Model

Page 12: Coevolutionary Automated Software Correction

Page 12Evaluation Phase

All programs run against all test cases

– Full population exposure vs. population sampling

– Hash table used to avoid repeat evaluations

Executions scored based on input and output of the program

– Black box style

– Run-time exceptions and time-outs monitored

Fitness for program is average of all execution scores

– Test case scores are directly related to this value

Page 13: Coevolutionary Automated Software Correction

Page 13CASC Evolutionary Model

Page 14: Coevolutionary Automated Software Correction

Page 14CASC Evolutionary Model

Page 15: Coevolutionary Automated Software Correction

Page 15CASC Evolutionary Model

Page 16: Coevolutionary Automated Software Correction

Page 16CASC Implementation Details

Adaptive parameter control

– EAs typically have many control parameters

– Difficult to find optimal settings for these parameters

– In CASC genetic operator probabilities are adaptive parameters

– Rewarded/punished based on performance

• If one operator is generating improved individuals more than the others make it more likely to be used

– Allows the system to adapt to the different phases in the search

Page 17: Coevolutionary Automated Software Correction

Page 17CASC Implementation Details

Parallel Computation– Computational complexity is generally a problem for EAs

– CASC typically writes and compiles thousands of programs on a given run

• Typically executes millions of evaluations (literally)

– To reduce run times executions are done in parallel (NIC cluster)

• All other evolutionary phases are done in serial

– Main node: responsible for generating and writing programs

– Worker nodes: responsible for compiling and executing programs

– Dramatically speeds up execution

Page 18: Coevolutionary Automated Software Correction

Page 18CASC Criticisms

Scalability

– The problem space is infinite for even simple programs

– Must correct software in reasonable time, regardless of program size

Fitness Function Design

– Each new problem for CASC requires a new fitness function

– Infinite possible fitness functions

– Limited number of high quality fitness functions

– Design of high quality fitness functions is extremely difficult

Page 19: Coevolutionary Automated Software Correction

Page 19Scalability: ARCD

Automated Relevant Code Discovery (ARCD) System

– Preprocessor for CASC

– Uses bug localization techniques to remove irrelevant lines of code from consideration

– Ensemble of analysis methods

• Each method generates a set of suspect lines of code

• Results are combined together and a relevant code set is generated

– Voting system

– Confidence levels

• Employ state of the art bug localization techniques

• Exploit the availability of fitness function

– Prototype is under development

– Three techniques currently implemented

• Positive/negative trace comparison

• Line suspicion based on fitness

• Fitness run-time plot

Page 20: Coevolutionary Automated Software Correction

Page 20ARCD: Pos./Neg. Trace Comparison

1 2 3 4 5 6 7 4 5 6 7 8 9 4 5 8 9 10 11 121 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 02 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 03 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 04 0 0 0 4 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 05 0 0 0 0 5 0 0 0 2 0 0 0 0 0 2 0 0 0 0 04 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 05 0 0 0 0 2 0 0 0 2 0 0 0 0 0 2 0 0 0 0 04 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 05 0 0 0 0 2 0 0 0 2 0 0 0 0 0 2 0 0 0 0 0

10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 011 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 012 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3

Positive Trace

Negative Trace

Page 21: Coevolutionary Automated Software Correction

Page 21ARCD: Fitness Plots

Fitness Plots

0

0.2

0.4

0.6

0.8

1

1.2

0 10 20 30 40 50 60 70

Lines Executed

Fitness

Incorrect Program

Correct Program

Page 22: Coevolutionary Automated Software Correction

Page 22Scalability: CC-CoEA

Cooperative-Competitve Coevolution (CC-CoEA)

– Multiple program populations

– Cooperative coevolution of program components

– Each sub-population is focused on a specific portion of the program

– Components are selected from each population and a program is assembled

– Fitness indicates how well each component operated

– Divide the problem space into smaller, more manageable pieces

– Allow CASC to “freeze” sub-populations that are suspected to have converged

Page 23: Coevolutionary Automated Software Correction

Page 23Scalability: CC-CoEA

Page 24: Coevolutionary Automated Software Correction

Page 24Fitness Function Design

Current approach: guide for fitness function generation

– Formalize the thought process for fitness function design

– Incorporate quality measures to assure quality fitness functions

– Incorporate advanced fitness function techniques, mapped to problem characteristics (indicate when techniques will be useful)

– Extend to be useful for black box search algorithms that use fitness functions

– Implement as semi-automated tool for fitness function design

Alternative approach

– Exploit formal specifications

• Information about expected program operation

• Possibly generate new, correct code from scratch

– No evidence this approach will be superior

• Many open problems

• One-to-many relationships

Page 25: Coevolutionary Automated Software Correction

Page 25

Questions?