“A System and Language for Building System-Specific, Static Analyses” CMSC 631 – Fall 2003...

30
“A System and Language for Building System-Specific, Static Analyses” CMSC 631 – Fall 2003 Seth Hallem, Benjamin Chelf, Yichen Xie, and Dawson Engler (presented by Mujtaba Ali)
  • date post

    22-Dec-2015
  • Category

    Documents

  • view

    216
  • download

    0

Transcript of “A System and Language for Building System-Specific, Static Analyses” CMSC 631 – Fall 2003...

“A System and Language for Building System-Specific, Static Analyses”

CMSC 631 – Fall 2003

Seth Hallem, Benjamin Chelf, Yichen Xie, and Dawson Engler

(presented by Mujtaba Ali)

2

Motivation

• Goal: Find as many bugs as possible

• Applications:– Free checker

• Detect double frees and dereference of freed pointers

– Lock checker• Warn if locks released without being acquired, double

acquired, or not released at all

– Statistical analysis to infer checking rules• Infer whether routines a and b must be paired

3

State Machine Transitions

• Analyses modeled as state machine transitions

• State machines are:– Simple enough for programmers to understand– Expressive enough to specify lots of analyses

unknown freed stop

kfree(v) kfree(v)

*v

Note: stop state does not always imply an error

4

Free Checker Exampleint contrived(int *p, int *w, int x) {

int *q;

if(x) {

kfree(w);

q = p;

p = 0;

}

if(!x)

return *w; // safe

return *q; // using 'q' after free!

}

int contrived_caller (int *w, int x, int *p) {

kfree (p);

contrived (p, w, x);

return *w; // using 'w' after free!

}

(pfreed)(pfreed)Assume x!=0

(p,wfreed)(p,w,qfreed)(w,qfreed,pstop)

Prune true branch

(wfreed,qstop)

(pfreed)

5

Free Checker Exampleint contrived(int *p, int *w, int x) {

int *q;

if(x) {

kfree(w);

q = p;

p = 0;

}

if(!x)

return *w; // safe

return *q; // using 'q' after free!

}

int contrived_caller (int *w, int x, int *p) {

kfree (p);

contrived (p, w, x);

return *w; // using 'w' after free!

}

(pfreed)(pfreed)Assume x==0

Prune false branch

(pfreed)(wfreed,qstop)

(pfreed)

6

Free Checker Exampleint contrived(int *p, int *w, int x) {

int *q;

if(x) {

kfree(w);

q = p;

p = 0;

}

if(!x)

return *w; // safe

return *q; // using 'q' after free!

}

int contrived_caller (int *w, int x, int *p) {

kfree (p);

contrived (p, w, x);

return *w; // using 'w' after free!

}

(pfreed)(wfreed,qstop)

(pfreed)(p,wfreed)(pfreed,wstop)

union

7

A Unified Framework

• Two components:– metal

• Language used for expressing custom analyses

• I.e, for expressing state machines

– xgcc• Analysis engine that executes metal specifications

8

metal

• Language for specifying state machines

• metal specification is called an “extension”

• For programmers, not compiler writers– Many rules known only to programmers

• Flexibility allows for different kinds of analyses, e.g.:– Find violations of known correctness rules– Automatically infer such rules from source

9

Example Extension: Free Checker

– Extensions feature ML-like pattern matching

state decl any_pointer v;

start:

{ kfree(v) } ==> v.freed;

v.freed:

{ *v } ==> v.stop,

{ err("using %s after free!", mc_identifier(v)); }

| { kfree(v) } ==> v.stop,

{ err("double free of %s!", mc_identifier(v)); }

;

10

metal Extension Terminology

– Global state variable (with exactly one instance) implied

– Instances of variable-specific state variables come and go

state decl any_pointer v;

start:

{ kfree(v) } ==> v.freed;

v.freed:

{ *v } ==> v.stop,

{ err(...); }

| { kfree(v) } ==> v.stop,

{ err(...); }

;

variable-specific state variable

variable-specific state values

global state value

11

metal Extensions and SMs

• Extension composed of one or more SMs– Extension state = the state of these SMs

• State machine state is a state tuple:– Value of global instance– Value of one of variable-specific instances

• State tuple notation: (start,v:pfreed)

• So, extension state = set of state tuples, e.g.{(start,v:pfreed),(start,v:wfreed)}

12

xgcc

• Executes metal extensions– Context-sensitive, interprocedural analysis

• Does not restrict metal extensions– Beyond determinism

• Scalability a primary design requirement– More rules + more code = more bugs found

13

xgcc Algorithm Overview

• Applies extension to CFG for a function in depth-first order

• At each program point, looks for executable transition in all state machines

• Provides additional enhancements:– Prunes non-executable paths– Follows simple value flow– Deletes state attached to redefined expressions

14

Intraprocedural Heuristics

• Basic block-level state caching

• Motivation: Exploit determinism of extension– Applying extension to same program point in

same state always gives same result

• Algorithm:– Before traversal, record extension state in each

basic block – a “block summary”– Subsequent traversals abort if their extension state

is a subset of the block summary

15

Block Summaryint contrived(int *p, int *w, int x) {

int *q;

if(x) {

kfree(w);

q = p;

p = 0;

}

if(!x)

return *w; // safe

return *q; // using 'q' after free!

}

int int contrived_caller (contrived_caller (int int *w, *w, int int x, x, int int *p) {*p) {

kfree (p);kfree (p);

contrived (p, w, x);contrived (p, w, x);

return return *w; *w; // using 'w' after free!// using 'w' after free!

}}

(pfreed)(pfreed)Assume x!=0

(p,wfreed)(p,w,qfreed)(w,qfreed,pstop)

Prune true branch

(wfreed,qstop)

(p(pfreedfreed))(start,v:wfreed)(start,v:qfreed)

multi-line basic blocks

16

Interprocedural Heuristics

• Require additional cache information

• Block summary is now a union of:– Transition edges: (s,v:tvs)(s’,v:tvs’)

– Add edges: (s,v:tunknown)(s’,v:tvs’)

• When new instances created inside basic block

• Suffix summary– Edges starting at a basic block and ending at

function’s exit point– Function summary=entry block’s suffix summary– Built backwards (in contrast to block summaries)

17

Block and Suffix Summariesint contrived(int *p, int *w, int x) {

int *q;

if(x) {

kfree(w);

q = p;

p = 0;

}

if(!x)

return *w; // safe

return *q; // using 'q' after free!

}

int int contrived_caller (contrived_caller (int int *w, *w, int int x, x, int int *p) {*p) {

kfree (p);kfree (p);

contrived (p, w, x);contrived (p, w, x);

return return *w; *w; // using 'w' after free!// using 'w' after free!

}}

(pfreed)(pfreed)Assume x!=0

(p,wfreed)(p,w,qfreed)(w,qfreed,pstop)

Prune true branch

(wfreed,qstop)

(p(pfreedfreed))(start,v:wfreed)(start,v:wfreed)(start,v:qfreed)(start,v:qstop)(start,v:wfreed)(start,v:wfreed)

(start,v:pfreed)(start,v:pfreed)(start,v:wunknown)(start,v:wfreed)

(start,v:pfreed)(start,v:pfreed)

18

Unsoundness

• xgcc’s interprocedural analysis is unsound– But that’s OK (Jim Larus agrees)– If it can catch some errors, it’s still useful

• Unsound analyses can catch some errors that sound analyses can’t– Some analyses (e.g.,inferring which routines must

be paired) can not be expressed soundly

• Focus is on executing extensions efficiently

19

Reducing False Positives• Killing variables and expressions

– Remove state machine when variable is defined

• Synonyms

• False path pruning• Targeted suppression

– i.e., xgcc hacks

p = q = kmalloc(...);if(!p) return 0;*q; /* safe dereference: q = p = not null */

20

Free Checker Exampleint contrived(int *p, int *w, int x) {

int *q;

if(x) {

kfree(w);

q = p;

p = 0;

}

if(!x)

return *w; // safe

return *q; // using 'q' after free!

}

int contrived_caller (int *w, int x, int *p) {

kfree (p);

contrived (p, w, x);

return *w; // using 'w' after free!

}

(pfreed)(pfreed)Assume x!=0

(p,wfreed)(p,w,qfreed)(w,qfreed,pstop)

Prune true branch

(wfreed,qstop)

(pfreed)

On a write, if there is a state machine for p, we “kill” it.

21

Reducing False Positives• Killing variables and expressions

– Remove state machine when variable is defined

• Synonyms

• False path pruning• Targeted suppression

– i.e., xgcc hacks

p = q = kmalloc(...);if(!p) return 0;*q; /* safe dereference: q = p = not null */

22

Ranking of Errors

• Impossible to eliminate all false positives

• xgcc ranks errors– Generic ranking: distance– Path-specific ranking by annotating extensions– Statistical ranking (z-ranking)

• Ranking can distinguish different uses– Linux semaphore routines up and down used as

both counters and locks– Interprocedural analysis can not handle this case

23

Extending metal Extensions

• Extend state space using general purpose code

• Path specific transitions– Different destination state for when analysis

follows true branch or false branch

• C Code actions– Can manipulate extension’s state using xgcc’s

interface

start:

{trylock(l) != 0} ==> true=l.locked, false=l.stop

| {trylock(l) == 0} ==> true=l.stop, false=l.locked

24

Example Extension: Free Checker

state decl any_pointer v;

start:

{ kfree(v) } ==> v.freed;

v.freed:

{ *v } ==> v.stop,

{ err("using %s after free!", mc_identifier(v)); }

| { kfree(v) } ==> v.stop,

{ err("double free of %s!", mc_identifier(v)); }

;

C Code actions

25

Extending metal Extensions

• Extend state space using general purpose code

• Path specific transitions– Different destination state for when analysis

follows true branch or false branch

• C Code actions– Can manipulate extension’s state using xgcc’s

interface

start:

{trylock(l) != 0} ==> true=l.locked, false=l.stop

| {trylock(l) == 0} ==> true=l.stop, false=l.locked

26

The Good

• Unsoundness presents new opportunities

• Designed for use by “everyday” programmers

• Heuristics to speed up execution

• Heuristics to reduce false positives

• Ranking to help sift through false positives

• Tested on systems code (Linux, OpenBSD)

• Paper is very clearly written!

27

The Bad

• Unsoundness is unsound– Jim Larus says eventually programmers will want

to move to sound tools

• Designed for use by “everyday” programmers– Advanced features require analysis knowledge

• Path-specific state machine transitions

• Path-specific error ranking

• xgcc/metal is now commercial– Boooo!

28

Related Work• ESP

– Sound– Uses state machine language like metal – More likely to scale in the interprocedural case

• SLAM– Model-checking approach– Verification tool intended for smaller code bases

• PREfix– Unsound, more expensive analysis– Fixed set of error types and analyses

29

Related Work (con’t.)

• ESC/Java– Uses theorem prover– High annotation burden (1 ann / 3 loc)

• Recent efforts to infer annotations

• Cqual– Interprocedural, sound analysis– Annotations to express program properties and to

suppress false positives

30

Singular Key Idea

• Unsound, uncomplete analysis based on clever heuristics can be an effective bug fighting tool– Such analyses can allow techniques not possible

with sound and complete analyses