Post on 23-Feb-2016
description
AUTOMATIC PROGRAM REPAIR USING GENETIC PROGRAMMING
1http://www.clairelegoues.com
CLAIRE LE GOUESFEBRUARY 12, 2013
Claire Le Goues 2
HOW DO HUMANS FIX BUGS?
http://www.clairelegoues.com
Claire Le Goues 3http://www.clairelegoues.com
Claire Le Goues 4
Mike
http://www.clairelegoues.com
Claire Le Goues 5http://www.clairelegoues.com
Mike(recent undergrad)
Claire Le Goues 6http://www.clairelegoues.com
(Mike’s project)
Claire Le Goues 7http://www.clairelegoues.com
Claire Le Goues 8http://www.clairelegoues.com
regression test cases
Claire Le Goues 9http://www.clairelegoues.com
Claire Le Goues 10http://www.clairelegoues.com
Claire Le Goues 11http://www.clairelegoues.com
(modified code)
Claire Le Goues 12http://www.clairelegoues.com
Claire Le Goues 13http://www.clairelegoues.com
Claire Le Goues 14http://www.clairelegoues.com
??!
Claire Le Goues 15http://www.clairelegoues.com
NOW WHAT?
Claire Le Goues 16http://www.clairelegoues.com
HOW DO HUMANS FIX BUGS?
Claire Le Goues 17http://www.clairelegoues.com
Claire Le Goues 18http://www.clairelegoues.com
printf transformer
Claire Le Goues 19http://www.clairelegoues.com
printf transformer
Claire Le Goues 20http://www.clairelegoues.com
Input:
2
5 6
1
3 4
8
7
9
11
10
12
Claire Le Goues 21http://www.clairelegoues.com
Input:
2
5 6
1
3 4
8
7
9
11
10
12
Legend:Likely faulty. probabilityMaybe faulty. probabilityNot faulty.
Claire Le Goues 22http://www.clairelegoues.com
PROBLEM: BUGGY SOFTWARE
“Everyday, almost 300 bugs appear […] far too many for only the Mozilla programmers to handle.”
– Mozilla Developer, 2005
Annual cost of software errors in the
US: $59.5 billion (0.6% of GDP).
90%: Maintenance
10%: Everything Else
Average time to fix a security-critical error:
28 days.
Claire Le Goues 23http://www.clairelegoues.com
HOW BAD IS IT?
Claire Le Goues 24http://www.clairelegoues.com
Claire Le Goues 25http://www.clairelegoues.com
Claire Le Goues 26http://www.clairelegoues.com
Tarsnap: 125 spelling/style 63 harmless 11 minor+ 1 major
75/200 = 38% TP rate$17 + 40 hours per TP
…REALLY?
Claire Le Goues 27http://www.clairelegoues.com
Tarsnap: 125 spelling/style 63 harmless 11 minor+ 1 major
75/200 = 38% TP rate$17 + 40 hours per TP
…REALLY?
Claire Le Goues 28http://www.clairelegoues.com
Tarsnap: 125 spelling/style 63 harmless 11 minor+ 1 major
75/200 = 38% TP rate$17 + 40 hours per TP
…REALLY?
Claire Le Goues 29http://www.clairelegoues.com
SOLUTION: PAY STRANGERS
Claire Le Goues 30http://www.clairelegoues.com
SOLUTION: PAY STRANGERS
Claire Le Goues 31http://www.clairelegoues.com
SOLUTION: AUTOMATE
Claire Le Goues 32http://www.clairelegoues.com
GENPROG: AUTOMATIC, SCALABLE, COMPETITIVE BUG REPAIR.
Claire Le Goues 33http://www.clairelegoues.com
GENPROG: AUTOMATIC, SCALABLE, COMPETITIVE BUG REPAIR.
Claire Le Goues 34http://www.clairelegoues.com
Input: program, test cases encoding required functionality, at least one (failing!) test case encoding the bug.Approach: use genetic programming to conduct a biased, random search for a set of edits to a program that fixes a given bug.
THE PLAN
Claire Le Goues http://www.clairelegoues.com 35
GENETIC PROGRAMMING: the application of evolutionary or
genetic algorithms to program source code.
Claire Le Goues 36
Population of variants.Fitness function evaluates desirability.Desirable individuals are more likely to be selected for iteration and reproduction.New variants created via:
GENETIC PROGRAMMING
http://www.clairelegoues.com
• Mutation • Crossover
Claire Le Goues 37
Population of variants.Fitness function evaluates desirability.Desirable individuals are more likely to be selected for iteration and reproduction.New variants created via:
GENETIC PROGRAMMING
http://www.clairelegoues.com
• Mutation • Crossover ABCDEF ABADEF
Claire Le Goues 38
Population of variants.Fitness function evaluates desirability.Desirable individuals are more likely to be selected for iteration and reproduction.New variants created via:
GENETIC PROGRAMMING
http://www.clairelegoues.com
• Mutation • Crossover ABCDEF ABCWVU
ZYXWVU ZYXDEF
Claire Le Goues 39
INPUT
OUTPUT
EVALUATE FITNESS
DISCARD
ACCEPT
MUTATE
Claire Le Goues 40http://www.clairelegoues.com
There are many ways to fix any bug• Upshot: coarse-grained edits.
Not every line of code is equally likely to contribute to the bug.
• Upshot: use test cases to refine mutation probabilities.
Programs contain the seeds of bug repair; programmers know what they’re doing.
• Upshot: do not invent new code.
SECRET SAUCES
Claire Le Goues 41
EVALUATE FITNESS
MUTATE
INPUT
OUTPUT
ACCEPT
DISCARD
Claire Le Goues 42http://www.clairelegoues.com
Compile and run each candidate:• Fitness = weighted count of passed test cases.
• Heuristic: passing the bug test is worth more.• If it fails to compile, fitness = 0.
Selection and Crossover: higher fitness variants are (randomly) retained and combined into the next generation.
FITNESS
Claire Le Goues 43MUTATE
DISCARD
INPUT EVALUATE FITNESS
ACCEPT
OUTPUT
Claire Le Goues 44http://www.cs.virginia.edu/~csl9q
Search: random (GP) search through population of patches.Approach: compose small random edits.
•Where to change?•How to change it?
MUTATION: BIRD’S EYE VIEW
Claire Le Goues 45http://www.clairelegoues.com
Use the test cases to associate bug with code:1. Instrument program.2. Record which statements are executed on
failing vs. passing test cases. 3. Weight statements accordingly.
Initial weighting heuristic:• High (1.0) if S is only visited on a failed test • Low (0.1) if S is visited on both failed and passed tests
• Zero (0.0) if S is not visited on any failed test.
MUTATION: WHERE
Claire Le Goues 46http://www.clairelegoues.com
Three statement-level edits: • delete X• replace X with Y• insert Y after X.
To mutate a variant:1. Choose a random program
statement S.2. Choose a random edit to apply
to S.3. Append edit to the variant.
Replace/insert: pick Y from somewhere else in the program.
MUTATION: HOW
Claire Le Goues 47http://www.clairelegoues.com
Three statement-level edits: • delete X• replace X with Y• insert Y after X.
To mutate a variant:1. Choose a random program
statement S.2. Choose a random edit to apply
to S.3. Append edit to the variant.
Replace/insert: pick Y from somewhere else in the program.
MUTATION: HOW Coarse! Reduces search space by at
least 2—10x.
Claire Le Goues 48http://www.clairelegoues.com
Three statement-level edits: • delete X• replace X with Y• insert Y after X.
To mutate a variant:1. Choose a random program
statement S.2. Choose a random edit to apply
to S.3. Append edit to the variant.
Replace/insert: pick Y from somewhere else in the program.
MUTATION: HOW Coarse! Reduces search space by at
least 2—10x.
Claire Le Goues 49http://www.clairelegoues.com
Three statement-level edits: • delete X• replace X with Y• insert Y after X.
To mutate a variant:1. Choose a random program
statement S.2. Choose a random edit to apply
to S.3. Append edit to the variant.
Replace/insert: pick Y from somewhere else in the program.
MUTATION: HOW Coarse! Reduces search space by at
least 2—10x.
Biased by fault localization.
Claire Le Goues 50http://www.clairelegoues.com
Three statement-level edits: • delete X• replace X with Y• insert Y after X.
To mutate a variant:1. Choose a random program
statement S.2. Choose a random edit to apply
to S.3. Append edit to the variant.
Replace/insert: pick Y from somewhere else in the program.
MUTATION: HOW Coarse! Reduces search space by at
least 2—10x.
Biased by fault localization.
Selection probabilities set
heuristically.
Claire Le Goues 51http://www.clairelegoues.com
Three statement-level edits: • delete X• replace X with Y• insert Y after X.
To mutate a variant:1. Choose a random program
statement S.2. Choose a random edit to apply
to S.3. Append edit to the variant.
Replace/insert: pick Y from somewhere else in the program.
MUTATION: HOW Coarse! Reduces search space by at
least 2—10x.
Biased by fault localization.
Selection probabilities set
heuristically.
Claire Le Goues 52http://www.clairelegoues.com
Three statement-level edits: • delete X• replace X with Y• insert Y after X.
To mutate a variant:1. Choose a random program
statement S.2. Choose a random edit to apply
to S.3. Append edit to the variant.
Replace/insert: pick Y from somewhere else in the program.
MUTATION: HOW Coarse! Reduces search space by at
least 2—10x.
Biased by fault localization.
Selection probabilities set
heuristically.
Code/expertise reuse.
Claire Le Goues 53http://www.clairelegoues.com
1void gcd(int a, int b) {2 if (a == 0) {3 printf(“%d”, b);4 }5 while (b > 0) {6 if (a > b) 7 a = a – b;8 else9 b = b – a;10 }11 printf(“%d”, a);12 return;13}
>
Claire Le Goues 54http://www.clairelegoues.com
1void gcd(int a, int b) {2 if (a == 0) {3 printf(“%d”, b);4 }5 while (b > 0) {6 if (a > b) 7 a = a – b;8 else9 b = b – a;10 }11 printf(“%d”, a);12 return;13}
> gcd(4,2)
Claire Le Goues 55http://www.clairelegoues.com
1void gcd(int a, int b) {2 if (a == 0) {3 printf(“%d”, b);4 }5 while (b > 0) {6 if (a > b) 7 a = a – b;8 else9 b = b – a;10 }11 printf(“%d”, a);12 return;13}
> gcd(4,2)> 2>
Claire Le Goues 56http://www.clairelegoues.com
1void gcd(int a, int b) {2 if (a == 0) {3 printf(“%d”, b);4 }5 while (b > 0) {6 if (a > b) 7 a = a – b;8 else9 b = b – a;10 }11 printf(“%d”, a);12 return;13}
> gcd(4,2)> 2> > gcd(1071,1029)
Claire Le Goues 57http://www.clairelegoues.com
1void gcd(int a, int b) {2 if (a == 0) {3 printf(“%d”, b);4 }5 while (b > 0) {6 if (a > b) 7 a = a – b;8 else9 b = b – a;10 }11 printf(“%d”, a);12 return;13}
> gcd(4,2)> 2> > gcd(1071,1029)> 21>
Claire Le Goues 58http://www.clairelegoues.com
1void gcd(int a, int b) {2 if (a == 0) {3 printf(“%d”, b);4 }5 while (b > 0) {6 if (a > b) 7 a = a – b;8 else9 b = b – a;10 }11 printf(“%d”, a);12 return;13}
> gcd(4,2)> 2> > gcd(1071,1029)> 21> > gcd(0,55)
Claire Le Goues 59http://www.clairelegoues.com
1void gcd(int a, int b) {2 if (a == 0) {3 printf(“%d”, b);4 }5 while (b > 0) {6 if (a > b) 7 a = a – b;8 else9 b = b – a;10 }11 printf(“%d”, a);12 return;13}
> gcd(4,2)> 2> > gcd(1071,1029)> 21> > gcd(0,55)> 55
(looping forever)
Claire Le Goues 60http://www.clairelegoues.com
1void gcd(int a, int b) {2 if (a == 0) {3 printf(“%d”, b);4 }5 while (b > 0) {6 if (a > b) 7 a = a – b;8 else9 b = b – a;10 }11 printf(“%d”, a);12 return;13}
Claire Le Goues 61http://www.clairelegoues.com
1void gcd(int a, int b) {2 if (a == 0) {3 printf(“%d”, b);4 }5 while (b > 0) {6 if (a > b) 7 a = a – b;8 else9 b = b – a;10 }11 printf(“%d”, a);12 return;13}
(a=0; b=55)
Claire Le Goues 62http://www.clairelegoues.com
1void gcd(int a, int b) {2 if (a == 0) {3 printf(“%d”, b);4 }5 while (b > 0) {6 if (a > b) 7 a = a – b;8 else9 b = b – a;10 }11 printf(“%d”, a);12 return;13}
(a=0; b=55)true
Claire Le Goues 63http://www.clairelegoues.com
1void gcd(int a, int b) {2 if (a == 0) {3 printf(“%d”, b);4 }5 while (b > 0) {6 if (a > b) 7 a = a – b;8 else9 b = b – a;10 }11 printf(“%d”, a);12 return;13}
(a=0; b=55)true> 55
Claire Le Goues 64http://www.clairelegoues.com
1void gcd(int a, int b) {2 if (a == 0) {3 printf(“%d”, b);4 }5 while (b > 0) {6 if (a > b) 7 a = a – b;8 else9 b = b – a;10 }11 printf(“%d”, a);12 return;13}
(a=0; b=55)true> 55
(a=0; b=55) true
Claire Le Goues 65http://www.clairelegoues.com
1void gcd(int a, int b) {2 if (a == 0) {3 printf(“%d”, b);4 }5 while (b > 0) {6 if (a > b) 7 a = a – b;8 else9 b = b – a;10 }11 printf(“%d”, a);12 return;13}
(a=0; b=55)true> 55
(a=0; b=55) truefalse
Claire Le Goues 66http://www.clairelegoues.com
1void gcd(int a, int b) {2 if (a == 0) {3 printf(“%d”, b);4 }5 while (b > 0) {6 if (a > b) 7 a = a – b;8 else9 b = b – a;10 }11 printf(“%d”, a);12 return;13}
(a=0; b=55)true> 55
(a=0; b=55) truefalse
b = 55 - 0
Claire Le Goues 67http://www.clairelegoues.com
1void gcd(int a, int b) {2 if (a == 0) {3 printf(“%d”, b);4 }5 while (b > 0) {6 if (a > b) 7 a = a – b;8 else9 b = b – a;10 }11 printf(“%d”, a);12 return;13}
(a=0; b=55)true> 55
(a=0; b=55) truefalse
b = 55 - 0
Claire Le Goues 68http://www.clairelegoues.com
1void gcd(int a, int b) {2 if (a == 0) {3 printf(“%d”, b);4 }5 while (b > 0) {6 if (a > b) 7 a = a – b;8 else9 b = b – a;10 }11 printf(“%d”, a);12 return;13}
(a=0; b=55)true> 55
(a=0; b=55) truefalse
b = 55 - 0
!
Claire Le Goues 69http://www.clairelegoues.com
1void gcd(int a, int b){2 if (a == 0) {3 printf(“%d”, b);4 }5 while (b > 0) {6 if (a > b) 7 a = a – b;8 else9 b = b – a;10 }11 printf(“%d”, a);12 return;13}
Claire Le Goues 70http://www.clairelegoues.com
1void gcd(int a, int b){2 if (a == 0) {3 printf(“%d”, b);4 }5 while (b > 0) {6 if (a > b) 7 a = a – b;8 else9 b = b – a;10 }11 printf(“%d”, a);12 return;13}
Claire Le Goues 71http://www.clairelegoues.com
1void gcd(int a, int b){2 if (a == 0) {3 printf(“%d”, b);4 }5 while (b > 0) {6 if (a > b) 7 a = a – b;8 else9 b = b – a;10 }11 printf(“%d”, a);12 return;13}
Claire Le Goues 72http://www.clairelegoues.com
1void gcd(int a, int b){2 if (a == 0) {3 printf(“%d”, b);4 }5 while (b > 0) {6 if (a > b) 7 a = a – b;8 else9 b = b – a;10 }11 printf(“%d”, a);12 return;13}
1void gcd(int a, int b){2 if (a == 0) {3 printf(“%d”, b);4 return;5 }6 while (b > 0) {7 if (a > b) 8 a = a – b;9 else10 b = b – a;11 }12 printf(“%d”, a);13 return;14}
Claire Le Goues 73http://www.clairelegoues.com
1void gcd(int a, int b){2 if (a == 0) {3 printf(“%d”, b);4 }5 while (b > 0) {6 if (a > b) 7 a = a – b;8 else9 b = b – a;10 }11 printf(“%d”, a);12 return;13}
1void gcd(int a, int b){2 if (a == 0) {3 printf(“%d”, b);4 return;5 }6 while (b > 0) {7 if (a > b) 8 a = a – b;9 else10 b = b – a;11 }12 printf(“%d”, a);13 return;14}
Claire Le Goues 74http://www.clairelegoues.com
1void gcd(int a, int b){2 if (a == 0) {3 printf(“%d”, b);4 }5 while (b > 0) {6 if (a > b) 7 a = a – b;8 else9 b = b – a;10 }11 printf(“%d”, a);12 return;13}
Claire Le Goues 75http://www.clairelegoues.com
printf(b)
{block}
while (b>0)
{block}{block} {block}
if(a==0)
if(a>b)
a = a – b
{block}{block}
printf(a) return
b = b – a
Input:
Claire Le Goues 76http://www.clairelegoues.com
printf(b)
{block}
while (b>0)
{block}{block} {block}
if(a==0)
if(a>b)
a = a – b
{block}{block}
printf(a) return
b = b – a
Input:
Legend:High change probability.Low change probability.Not changed.
Claire Le Goues 77http://www.clairelegoues.com
printf(b)
{block}
while (b>0)
{block}{block} {block}
if(a==0)
if(a>b)
a = a – b
{block}{block}
printf(a) return
b = b – a
Input:
Legend:High change probability.Low change probability.Not changed.
Claire Le Goues 78http://www.clairelegoues.com
printf(b)
{block}
while (b>0)
{block}{block} {block}
if(a==0)
if(a>b)
a = a – b
{block}{block}
printf(a) return
b = b – a
Input:
An edit is: • Replace statement
X with statement Y • Insert statement X
after statement Y • Delete statement X
Claire Le Goues 79http://www.clairelegoues.com
printf(b)
{block}
while (b>0)
{block}{block} {block}
if(a==0)
if(a>b)
a = a – b
{block}{block}
printf(a) return
b = b – a
Input:
An edit is: • Replace statement
X with statement Y • Insert statement X
after statement Y • Delete statement X
Claire Le Goues 80http://www.clairelegoues.com
printf(b)
{block}
while (b>0)
{block}{block} {block}
if(a==0)
if(a>b)
a = a – b
{block}{block}
printf(a) return
b = b – a
Input:
An edit is: • Replace statement
X with statement Y • Insert statement X
after statement Y • Delete statement X
Claire Le Goues 81http://www.clairelegoues.com
printf(b)
{block}
while (b>0)
{block}{block} {block}
if(a==0)
if(a>b)
a = a – b
{block}{block}
printf(a) return
b = b – a
Input:
An edit is: • Replace statement
X with statement Y • Insert statement X
after statement Y • Delete statement X
Claire Le Goues 82http://www.clairelegoues.com
printf(b)
{block}
while (b>0)
{block}{block} {block}
if(a==0)
if(a>b)
a = a – b
{block}{block}
printf(a) return
b = b – a
Input:
An edit is: • Replace statement
X with statement Y • Insert statement X
after statement Y • Delete statement X
Claire Le Goues 83http://www.clairelegoues.com
return
{block}
while (b>0)
{block}{block} {block}
if(a==0)
if(a>b)
a = a – b
{block}{block}
printf(a) return
b = b – a
Input:
An edit is: • Replace statement
X with statement Y • Insert statement X
after statement Y • Delete statement X
Claire Le Goues 84http://www.clairelegoues.com
return
{block}
while (b>0)
{block}{block} {block}
if(a==0)
if(a>b)
a = a – b
{block}{block}
printf(a) return
b = b – a
Input:
An edit is: • Replace statement
X with statement Y • Insert statement X
after statement Y • Delete statement X
Claire Le Goues 85
INPUT
OUTPUT
EVALUATE FITNESS
DISCARD
ACCEPT
MUTATE
Claire Le Goues 86http://www.clairelegoues.com
What is it?• GenProg: automatic program repair.• Approach: search for a patch to a program with a bug.
Does it work?
WHERE ARE WE?
Claire Le Goues
Program Description Size Bug Type Time
Claire Le Goues
Program Description Size Bug Type Timegcd example 22 infinite loop 153
Claire Le Goues
Program Description Size Bug Type Timegcd example 22 infinite loop 153zune example 28 infinite loop 42
Claire Le Goues
Program Description Size Bug Type Timegcd example 22 infinite loop 153zune example 28 infinite loop 42uniq text processing 1146 segmentation fault 34look-u dictionary lookup 1169 segmentation fault 45look-s dictionary lookup 1363 infinite loop 55units metric conversion 1504 segmentation fault 109deroff document processing 2236 segmentation fault 131indent code processing 9906 infinite loop 546flex lexical analyzer generator 18774 segmentation fault 230
Claire Le Goues
Program Description Size Bug Type Timegcd example 22 infinite loop 153zune example 28 infinite loop 42uniq text processing 1146 segmentation fault 34look-u dictionary lookup 1169 segmentation fault 45look-s dictionary lookup 1363 infinite loop 55units metric conversion 1504 segmentation fault 109deroff document processing 2236 segmentation fault 131indent code processing 9906 infinite loop 546flex lexical analyzer generator 18774 segmentation fault 230nullhttpd webserver 5575 heap buffer overflow (code) 578openldap directory protocol 292598 non-overflow denial of service 665ccrypt encryption utility 7515 segmentation fault 330lighttpd webserver 51895 heap buffer overflow (vars) 394atris graphical game 21553 local stack buffer exploit 80php scripting language 764489 integer overflow 56wu-ftpd FTP server 67029 format string vulnerability 2256leukocyte computational biology 6718 segmentation fault 360
tiff image processing 84067 segmentation fault 108imagemagick image processing 450516 wrong output 2160
Claire Le Goues
Program Description Size Bug Type Timegcd example 22 infinite loop 153zune example 28 infinite loop 42uniq text processing 1146 segmentation fault 34look-u dictionary lookup 1169 segmentation fault 45look-s dictionary lookup 1363 infinite loop 55units metric conversion 1504 segmentation fault 109deroff document processing 2236 segmentation fault 131indent code processing 9906 infinite loop 546flex lexical analyzer generator 18774 segmentation fault 230nullhttpd webserver 5575 heap buffer overflow (code) 578openldap directory protocol 292598 non-overflow denial of service 665ccrypt encryption utility 7515 segmentation fault 330lighttpd webserver 51895 heap buffer overflow (vars) 394atris graphical game 21553 local stack buffer exploit 80php scripting language 764489 integer overflow 56wu-ftpd FTP server 67029 format string vulnerability 2256leukocyte computational biology 6718 segmentation fault 360
tiff image processing 84067 segmentation fault 108imagemagick image processing 450516 wrong output 2160
Claire Le Goues
Program Description Size Bug Type Timegcd example 22 infinite loop 153zune example 28 infinite loop 42uniq text processing 1146 segmentation fault 34look-u dictionary lookup 1169 segmentation fault 45look-s dictionary lookup 1363 infinite loop 55units metric conversion 1504 segmentation fault 109deroff document processing 2236 segmentation fault 131indent code processing 9906 infinite loop 546flex lexical analyzer generator 18774 segmentation fault 230nullhttpd webserver 5575 heap buffer overflow (code) 578openldap directory protocol 292598 non-overflow denial of service 665ccrypt encryption utility 7515 segmentation fault 330lighttpd webserver 51895 heap buffer overflow (vars) 394atris graphical game 21553 local stack buffer exploit 80php scripting language 764489 integer overflow 56wu-ftpd FTP server 67029 format string vulnerability 2256leukocyte computational biology 6718 segmentation fault 360
tiff image processing 84067 segmentation fault 108imagemagick image processing 450516 wrong output 2160
Claire Le Goues
Program Description Size Bug Type Timegcd example 22 infinite loop 153zune example 28 infinite loop 42uniq text processing 1146 segmentation fault 34look-u dictionary lookup 1169 segmentation fault 45look-s dictionary lookup 1363 infinite loop 55units metric conversion 1504 segmentation fault 109deroff document processing 2236 segmentation fault 131indent code processing 9906 infinite loop 546flex lexical analyzer generator 18774 segmentation fault 230nullhttpd webserver 5575 heap buffer overflow (code) 578openldap directory protocol 292598 non-overflow denial of service 665ccrypt encryption utility 7515 segmentation fault 330lighttpd webserver 51895 heap buffer overflow (vars) 394atris graphical game 21553 local stack buffer exploit 80php scripting language 764489 integer overflow 56wu-ftpd FTP server 67029 format string vulnerability 2256leukocyte computational biology 6718 segmentation fault 360
tiff image processing 84067 segmentation fault 108imagemagick image processing 450516 wrong output 2160
Claire Le Goues 95http://www.clairelegoues.com
What is it?• GenProg: automatic program repair.• Approach: search for a patch to a program with a
bug.Does it work?
• Yes: repairs a variety of bugs in a variety of programs.
Cool! But: is it really usable?• Are the repairs trustworthy?• Is it human-competitive?• Can we make it better?
WHERE ARE WE?
Claire Le Goues 96http://www.clairelegoues.com
What is it?• GenProg: automatic program repair.• Approach: search for a patch to a program with a
bug.Does it work?
• Yes: repairs a variety of bugs in a variety of programs.
Cool! But: is it really usable?• Are the repairs trustworthy?• Is it human-competitive?• Can we make it better?
WHERE ARE WE?
Claire Le Goues 97http://www.clairelegoues.com
What is it?• GenProg: automatic program repair.• Approach: search for a patch to a program with a
bug.Does it work?
• Yes: repairs a variety of bugs in a variety of programs.
Cool! But: is it really usable?• Are the repairs trustworthy?• Is it human-competitive?• Can we make it better?
WHERE ARE WE?
Claire Le Goues 98http://www.clairelegoues.com
What is it?• GenProg: automatic program repair.• Approach: search for a patch to a program with a
bug.Does it work?
• Yes: repairs a variety of bugs in a variety of programs.
Cool! But: is it really usable?• Are the repairs trustworthy?• Is it human-competitive?• Can we make it better?
WHERE ARE WE?
Claire Le Goues 99http://www.clairelegoues.com
What is it?• GenProg: automatic program repair.• Approach: search for a patch to a program with a
bug.Does it work?
• Yes: repairs a variety of bugs in a variety of programs.
Cool! But: is it really usable?• Are the repairs trustworthy?• Is it human-competitive?• Can we make it better?
WHERE ARE WE?
Claire Le Goues 100http://www.clairelegoues.com
Recall: any proposed repair must pass all regression test cases, ensuring it maintains required functionality.
REPAIR QUALITY
However, repairs are not always what a human would have done.
• Example: Adds a bounds check to a read, rather than refactoring to use a safe abstract string class.
Are they still acceptable?
Claire Le Goues 101http://www.clairelegoues.com
What makes a high-quality repair?• Retains required functionality.• Does not introduce new bugs.• Addresses the cause, not just the symptom.
Experimental scenario: a long-running server with an intrusion detection system that will generate/deploy repairs for detected anomalies.
• Worst-case: no humans around to vet the repairs!
REPAIR QUALITY EXPERIMENT
Claire Le Goues 102http://www.clairelegoues.com
One web application language interpreter with an integer overflow vulnerability:
• php Workload: 15kloc secure reservation system web app, 12,375 requests.
REPAIR QUALITY BENCHMARKS Two webservers with buffer overflows:
• nullhttpd (simple, multithreaded)
• lighttpd (used by Wikimedia, etc.)
Workload: 138,226 requests from 12,743 distinct client IP addresses.Workload held out in both cases; one day of data.
Claire Le Goues 103http://www.clairelegoues.com
Apply indicative workloads to vanilla servers.• Record results and times.
Send attack input. • Caught by intrusion detection system.
Generate, deploy repair using attack input and regression test cases.Apply indicative workload to patched server. Compare requests processed pre- and post- repair.
• Each request must yield exactly the same output (bit-per-bit) in the same time or less!
EXPERIMENTAL SETUP
Claire Le Goues 104http://www.clairelegoues.com
Program Post-patch requests lostnullhttpd 0.00 % ± 0.25%lighttpd 0.03% ± 1.53%php 0.02% ± 0.02%
REPAIR QUALITY RESULTS
Claire Le Goues 105http://www.clairelegoues.com
Program Post-patch requests lostFuzz Tests FailedGeneral
nullhttpd 0.00 % ± 0.25% 0 0 lighttpd 0.03% ± 1.53% 1410 1410 php 0.02% ± 0.02% 3 3
REPAIR QUALITY RESULTS
Claire Le Goues 106http://www.clairelegoues.com
Program Post-patch requests lostFuzz Tests FailedGeneral Exploit
nullhttpd 0.00 % ± 0.25% 0 0 lighttpd 0.03% ± 1.53% 1410 1410 php 0.02% ± 0.02% 3 3
REPAIR QUALITY RESULTS
Claire Le Goues 107http://www.clairelegoues.com
Program Post-patch requests lostFuzz Tests FailedGeneral Exploit
nullhttpd 0.00 % ± 0.25% 0 0 10 0lighttpd 0.03% ± 1.53% 1410 1410 9 0php 0.02% ± 0.02% 3 3 5 0
REPAIR QUALITY RESULTS
Claire Le Goues 108http://www.clairelegoues.com
What is it?• GenProg: automatic program repair.• Approach: search for a patch to a program with a
bug.Does it work?
• Yes: repairs a variety of bugs in a variety of programs.
Cool! But: is it really usable?• Are the repairs trustworthy?• Is it human-competitive?• Can we make it better?
WHERE ARE WE?
Claire Le Goues http://www.clairelegoues.com
COMPETITIVE
109
How many bugs can GenProg fix?
How much does it cost?
Claire Le Goues 110http://www.clairelegoues.com
Goal: systematically evaluate GenProg on a general, indicative bug set.General approach:
• Avoid overfitting: fix the algorithm. • Systematically create a generalizable benchmark set.
• Try to repair every bug in the benchmark set, establish grounded cost measurements.
SETUP
Claire Le Goues 111http://www.clairelegoues.com
Goal: systematically evaluate GenProg on a general, indicative bug set.General approach:
• Avoid overfitting: fix the algorithm. • Systematically create a generalizable benchmark set.
• Try to repair every bug in the benchmark set, establish grounded cost measurements.
SETUP
Claire Le Goues 112http://www.clairelegoues.com
CHALLENGE: INDICATIVE BUG SET
Claire Le Goues 113http://www.clairelegoues.com
Goal: a large set of important, reproducible bugs in non-trivial programs. Approach: use historical data to approximate discovery and repair of bugs in the wild.
SYSTEMATIC BENCHMARK SELECTION
Claire Le Goues 114http://www.clairelegoues.com
Consider top programs from SourceForge, Google Code, Fedora SRPM, etc:
• Find pairs of viable versions where test case behavior changes.
• Take all tests from most recent version.• Go back in time through the source control.
Corresponds to a human-written repair for the bug tested by the failing test case(s).
SYSTEMATIC BENCHMARK SELECTION
Claire Le Goues 115http://www.clairelegoues.com
BENCHMARKSProgram LOC Tests Bugs Descriptionfbc 97,000 773 3 Language (legacy)gmp 145,000 146 2 Multiple precision mathgzip 491,000 12 5 Data compressionlibtiff 77,000 78 24 Image manipulationlighttpd 62,000 295 9 Web serverphp 1,046,000 8,471 44 Language (web)python 407,000 355 11 Language (general)wireshark 2,814,000 63 7 Network packet analyzerTotal 5,139,000 10,193 105
Claire Le Goues 116http://www.clairelegoues.com
BENCHMARKSProgram LOC Tests Bugs Descriptionfbc 97,000 773 3 Language (legacy)gmp 145,000 146 2 Multiple precision mathgzip 491,000 12 5 Data compressionlibtiff 77,000 78 24 Image manipulationlighttpd 62,000 295 9 Web serverphp 1,046,000 8,471 44 Language (web)python 407,000 355 11 Language (general)wireshark 2,814,000 63 7 Network packet analyzerTotal 5,139,000 10,193 105
Claire Le Goues 117http://www.clairelegoues.com
BENCHMARKSProgram LOC Tests Bugs Descriptionfbc 97,000 773 3 Language (legacy)gmp 145,000 146 2 Multiple precision mathgzip 491,000 12 5 Data compressionlibtiff 77,000 78 24 Image manipulationlighttpd 62,000 295 9 Web serverphp 1,046,000 8,471 44 Language (web)python 407,000 355 11 Language (general)wireshark 2,814,000 63 7 Network packet analyzerTotal 5,139,000 10,193 105
Claire Le Goues 118http://www.clairelegoues.com
BENCHMARKSProgram LOC Tests Bugs Descriptionfbc 97,000 773 3 Language (legacy)gmp 145,000 146 2 Multiple precision mathgzip 491,000 12 5 Data compressionlibtiff 77,000 78 24 Image manipulationlighttpd 62,000 295 9 Web serverphp 1,046,000 8,471 44 Language (web)python 407,000 355 11 Language (general)wireshark 2,814,000 63 7 Network packet analyzerTotal 5,139,000 10,193 105
Claire Le Goues 119http://www.clairelegoues.com
CHALLENGE: GROUNDED COST MEASUREMENTS
Claire Le Goues http://www.clairelegoues.com 120
Claire Le Goues http://www.clairelegoues.com 121
Claire Le Goues 122http://www.clairelegoues.com
READY
Claire Le Goues 123http://www.clairelegoues.com
GO
Claire Le Goues 124http://www.clairelegoues.com
13 HOURS LATER
Claire Le Goues 125http://www.clairelegoues.com
SUCCESS/COST
Program
Defects Repaired
Cost per non-repair
Cost per repair
Hours US$ Hours US$
fbc 1/3 8.52 5.56 6.52 4.08gmp 1/2 9.93 6.61 1.60 0.44gzip 1/5 5.11 3.04 1.41 0.30libtiff 17/24 7.81 5.04 1.05 0.04lighttpd 5/9 10.79 7.25 1.34 0.25php 28/44 13.00 8.80 1.84 0.62python 1/11 13.00 8.80 1.22 0.16wireshark 1/7 13.00 8.80 1.23 0.17Total 55/105 11.22h 1.60h$403 for all 105 trials, leading to 55 repairs; $7.32 per bug repaired.
Claire Le Goues 126http://www.clairelegoues.com
SUCCESS/COST
Program
Defects Repaired
Cost per non-repair
Cost per repair
Hours US$ Hours US$
fbc 1/3 8.52 5.56 6.52 4.08gmp 1/2 9.93 6.61 1.60 0.44gzip 1/5 5.11 3.04 1.41 0.30libtiff 17/24 7.81 5.04 1.05 0.04lighttpd 5/9 10.79 7.25 1.34 0.25php 28/44 13.00 8.80 1.84 0.62python 1/11 13.00 8.80 1.22 0.16wireshark 1/7 13.00 8.80 1.23 0.17Total 55/105 11.22h 1.60h$403 for all 105 trials, leading to 55 repairs; $7.32 per bug repaired.
Claire Le Goues 127http://www.clairelegoues.com
SUCCESS/COST
Program
Defects Repaired
Cost per non-repair
Cost per repair
Hours US$ Hours US$
fbc 1/3 8.52 5.56 6.52 4.08gmp 1/2 9.93 6.61 1.60 0.44gzip 1/5 5.11 3.04 1.41 0.30libtiff 17/24 7.81 5.04 1.05 0.04lighttpd 5/9 10.79 7.25 1.34 0.25php 28/44 13.00 8.80 1.84 0.62python 1/11 13.00 8.80 1.22 0.16wireshark 1/7 13.00 8.80 1.23 0.17Total 55/105 11.22h 1.60h$403 for all 105 trials, leading to 55 repairs; $7.32 per bug repaired.
Claire Le Goues 128http://www.clairelegoues.com
SUCCESS/COST
Program
Defects Repaired
Cost per non-repair
Cost per repair
Hours US$ Hours US$
fbc 1/3 8.52 5.56 6.52 4.08gmp 1/2 9.93 6.61 1.60 0.44gzip 1/5 5.11 3.04 1.41 0.30libtiff 17/24 7.81 5.04 1.05 0.04lighttpd 5/9 10.79 7.25 1.34 0.25php 28/44 13.00 8.80 1.84 0.62python 1/11 13.00 8.80 1.22 0.16wireshark 1/7 13.00 8.80 1.23 0.17Total 55/105 11.22h 1.60h$403 for all 105 trials, leading to 55 repairs; $7.32 per bug repaired.
Claire Le Goues 129http://www.clairelegoues.com
JBoss issue tracking: median 5.0, mean 15.3 hours.1
IBM: $25 per defect during coding, rising at build, Q&A, post-release, etc.2
Tarsnap.com: $17, 40 hours per non-trivial repair.3
Bug bounty programs in general:• At least $500 for security-critical bugs.• One of the php bugs that GenProg fixed has an
associated NIST security certification.1C. Weiß, R. Premraj, T. Zimmermann, and A. Zeller, “How long will it take to fix this bug?” in Workshop on Mining Software Repositories, May 2007. 2 L. Williamson, “IBM Rational software analyzer: Beyond source code,” in Rational Software Developer Conference, Jun. 2008. 3http://www.tarsnap.com/bugbounty.html
PUBLIC COMPARISON
Claire Le Goues 130http://www.clairelegoues.com
What is it?• GenProg: automatic program repair.• Approach: search for a patch to a program with a
bug.Does it work?
• Yes: repairs a variety of bugs in a variety of programs.
Cool! But: is it really usable?• Are the repairs trustworthy?• Is it human-competitive?• Can we make it better?
WHERE ARE WE?
Claire Le Goues 131http://www.clairelegoues.com
ATYPICAL SPACE
Claire Le Goues http://www.clairelegoues.com 132
OTHER GP PROBLEMSGECCO GP-TRACK
BEST PAPERSLearning: expression trees or listsPopulation: 64 – 2500Iterations: 50 – 10000Max variant size: 48 operations, 17 levels
PROGRAM REPAIR
Claire Le Goues http://www.clairelegoues.com 133
OTHER GP PROBLEMSGECCO GP-TRACK
BEST PAPERSLearning: expression trees or listsPopulation: 64 – 2500Iterations: 50 – 10000Max variant size: 48 operations, 17 levels
PROGRAM REPAIRLearning: patches or repaired programsPopulation: 40Iterations: 10 Max variant size: unboundedLargest benchmark program: 2.8 million lines of C code.
Claire Le Goues http://www.clairelegoues.com 134
OTHER GP PROBLEMSGECCO GP-TRACK
BEST PAPERSLearning: expression trees or listsPopulation: 64 – 2500Iterations: 50 – 10000Max variant size: 48 operations, 17 levels
PROGRAM REPAIRLearning: patches or repaired programsPopulation: 40Iterations: 10 Max variant size: unboundedLargest benchmark program: 2.8 million lines of C code
Claire Le Goues 135http://www.clairelegoues.com
Representation: • Which representation choice gives better results? • Which representation features contribute most to
success? Crossover: Which crossover operator is best? Operators:
• Which operators contribute the most to success? • How should they be selected?
Search space: How should the representation weight program statements to best define the search space?
UNDER THE HOOD
Claire Le Goues 136http://www.clairelegoues.com
Representation: • Which representation choice gives better results? • Which representation features contribute most to
success? Crossover: Which crossover operator is best? Operators:
• Which operators contribute the most to success? • How should they be selected?
Search space: How should the representation weight program statements to best define the search space?
UNDER THE HOOD
Claire Le Goues 137http://www.clairelegoues.com
Hypothesis: statements executed only by the failing test case(s) should be weighted more heavily than those also executed by the passing test cases.Procedure: examine that ratio in actual repairs.Result:
SEARCH SPACE: SETUP
Claire Le Goues 138http://www.clairelegoues.com
Hypothesis: statements executed only by the failing test case(s) should be weighted more heavily than those also executed by the passing test cases.Procedure: examine that ratio in actual repairs.Result:
Expected: 10 : 1 vs.
Actual: 1 : 1.85
SEARCH SPACE: SETUP
Claire Le Goues 139http://www.clairelegoues.com
HUH.
Claire Le Goues 140http://www.clairelegoues.com
Dataset: the 105 bugs from the earlier dataset.Rerun that experiment, varying the statement weighting scheme:
• Default: the original experiment• Realistic: 1 : 1.85• Uniform: 1 : 1
Metrics: time to repair, success rate.
SEARCH SPACE EXPERIMENT
Claire Le Goues 141http://www.clairelegoues.com
Easy Medium Hard All0
102030405060708090
100110
34.3
93.6103.1
66.0
49.1
75.7
27.1
59.5
36.3
67.057.7 58.6
DefaultRealisticEqual
Search difficulty
# fit
ness
eva
luat
ions
to re
pair
SEARCH SPACE: REPAIR TIME10 : 1
1 : 1.85
1 : 1
Claire Le Goues 142http://www.clairelegoues.com
Easy Medium Hard All0
102030405060708090
100110
34.3
93.6103.1
66.0
49.1
75.7
27.1
59.5
36.3
67.057.7 58.6
DefaultRealisticEqual
Search difficulty
# fit
ness
eva
luat
ions
to re
pair
SEARCH SPACE: REPAIR TIME10 : 1
1 : 1.85
1 : 1
Claire Le Goues 143http://www.clairelegoues.com
Easy Medium Hard 0% All0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%1.00
0.78
0.24
0.00
0.62
0.930.85
0.23 0.23
0.69
0.92 0.90
0.33
0.19
0.70
DefaultRealisticEqual
Search difficulty
GP
Succ
ess
Rat
eSEARCH SPACE: SUCCESS RATE
10 : 1
1 : 1.85
1 : 1
Claire Le Goues 144http://www.clairelegoues.com
Easy Medium Hard 0% All0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%1.00
0.78
0.24
0.00
0.62
0.930.85
0.23 0.23
0.69
0.92 0.90
0.33
0.19
0.70
DefaultRealisticEqual
Search difficulty
GP
Succ
ess
Rat
eSEARCH SPACE: SUCCESS RATE
10 : 1
1 : 1.85
1 : 1
Claire Le Goues 145http://www.clairelegoues.com
Investigated 3 other sets of choices related to representation and operators.Combination of modifications:
• Repairs an additional 5 bugs; 60 out of 105. • Reduced time to repair 17 – 43% for difficult bug scenarios.
Have similarly studied fitness function improvements.Investigation continues!
MORE UNDER THE HOOD
Claire Le Goues 146http://www.clairelegoues.com
Repair templates for more expressive mutation operators.
• Mine common fixes from existing source code and source code repositories.
Provably secure patching.• Apply cryptographic techniques to make patches difficult to reverse-engineer.
Medical device system safety.• Conduct a whole-environment safety analysis for medical device users.
WHAT’S NEXT?
Claire Le Goues 147http://www.clairelegoues.com
Claire Le Goues 148http://www.clairelegoues.com
JournalAutomatic program repairCode quality metrics for specification miningTool support for formal program verification
TSE ’12
(featured article)ConferenceICSE ’091
(best paper)ICSE ’122 GECCO ’122
1gold, Humies2bronze, Humies
Claire Le Goues 149http://www.clairelegoues.com
JournalAutomatic program repairCode quality metrics for specification miningTool support for formal program verification
Workshop/Tutorial
FOSER ’10 SBST ’09
GECCO ’12
TSE ’12
(featured article)
CACM ’10
(invited article)ConferenceGECCO ’10ICSE ’091
(best paper)
GECCO ’091
(best paper)ICSE ’122 GECCO ’122
1gold, Humies2bronze, Humies
Claire Le Goues 150http://www.clairelegoues.com
JournalAutomatic program repairCode quality metrics for specification miningTool support for formal program verification
Workshop/Tutorial
FOSER ’10 SBST ’09
GECCO ’12
TSE ’12
(featured article)
CACM ’10
(invited article)ConferenceGECCO ’10TACAS ’09 ICSE ’091
(best paper)
GECCO ’091
(best paper)ICSE ’122 GECCO ’122
1gold, Humies2bronze, Humies
TSE ’12
Claire Le Goues 151http://www.clairelegoues.com
JournalAutomatic program repairCode quality metrics for specification miningTool support for formal program verification
Workshop/Tutorial
FOSER ’10 SBST ’09
GECCO ’12
TSE ’12 TSE ’12
(featured article)
CACM ’10
(invited article)ConferenceGECCO ’10TACAS ’09 ICSE ’091
(best paper)
GECCO ’091
(best paper)SEFM ’11 ICSE ’122 GECCO ’122
1gold, Humies2bronze, Humies
Claire Le Goues 152http://www.clairelegoues.com
GenProg: scalable, automatic bug repair.• Genetic programming search for a patch that addresses
a given bug.• Render the search tractable by restricting the search
space intelligently.It works!
• Fixes a variety of bugs in a variety of programs.• Repaired 60 of 105 bugs for < $8 each, on average.
Benchmarks/results/source code/VM images available:• http://genprog.cs.virginia.edu
CONCLUSIONS
Claire Le Goues 153http://www.clairelegoues.com
I LOVE QUESTIONS.
Claire Le Goues 154http://www.clairelegoues.com
WHICH BUGS…?Slightly more likely to fix bugs where the human:
• restricts the repair to statements.• touched fewer files.
As fault space decreases, success increases, repair time decreases.As fix space increases, repair time decreases.
Claire Le Goues 155http://www.clairelegoues.com
SEARCH SPACE
Y = 0.8x + 0.02R2 = 0.63
Claire Le Goues 156http://www.clairelegoues.com
Minimization step: try removing each line in the patch, check if the result still passes all tests Delta Debugging finds a 1-minimal subset of the diff in O(n2) time We use a tree-structured diff algorithm (diffX)
• Avoids problems with balanced curly braces, etc.
Takes significantly less time than finding the initial repair repair.
CODE BLOAT
Claire Le Goues 157http://www.clairelegoues.com
1. class test_class { 2. public function __get($n) 3. { return $this; %$ }4. public function b()5. { return; }6. }7. global $test3; 8. $test3 = new test_class(); 9. $test3->a->b();
EXAMPLE: PHP BUG #54372 Relevant code: function zend_std_read_property in zend_object_handlers.cNote: memory management uses reference counting.Problem: this line:449.zval_ptr_dtor(object)If object points to $this and $this is global, its memory is completely freed, even though we could access $this later.
Expected output: nothingBuggy output: crash on line 9.
Claire Le Goues 158http://www.clairelegoues.com
GenProg :% 448c448,451> Z_ADDROF_P(object);> if (PZVAL_IS_REF(object)) > {> SEPARATE_ZVAL(&object);> } zval_ptr_dtor(&object)
EXAMPLE: PHP BUG #54372Human : % 449c449,453 < zval_ptr_dtor(&object);> if (*retval != object)> { // expected> zval_ptr_dtor(&object);> } else {> Z_DELREF_P(object);> }
Claire Le Goues 159http://www.clairelegoues.com
SCALABLE: REPRESENTATION
1
2
54
Naïve:
1
2
4 5
5’
1
32
54
Input:
New:
Delete(3)
Replace(3,5)
Claire Le Goues 160http://www.clairelegoues.com
SCALABLE: FITNESSFitness:
• Subsample test cases.
• Evaluate in parallel.Random runs:
• Multiple simultaneous runs on different seeds.
Claire Le Goues 161http://www.clairelegoues.com