CIL: Infrastructure for C Program Analysis and Transformation George C. Necula, Scott McPeak, S. P....

22
CIL: Infrastructure for C Program Analysis and Transformation George C. Necula, Scott McPeak, S. P. Rahul, Westley Weimer http://www.cs.berkeley.edu/ ~necula/cil – CC ’02 Friday, April
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    216
  • download

    0

Transcript of CIL: Infrastructure for C Program Analysis and Transformation George C. Necula, Scott McPeak, S. P....

CIL: Infrastructure for C Program Analysis and Transformation

George C. Necula, Scott McPeak,S. P. Rahul, Westley Weimer

http://www.cs.berkeley.edu/~necula/cil

ETAPS – CC ’02 Friday, April 12

What is CIL?

Distills C language into a few key forms with precise semantics

Parser + IR + Program Merger for CMaintains types, close ties to sourceHighly structured, clean subset of CHandles ANSI/GCC/MSVC

Why CIL?

Analyses and TransformationsEasy to use impersonates compiler & linker $ make project CC=cil

Easy to work with converts away tricky syntax leaves just the heart of the language separates concepts

C Feature Separation

CIL separates language components pure expressions statements with side-effects control-flow embedded CFG

Keeps all programmer names temps serialize side-effects simplified scoping

Example: C Lvalues

An exp referring to a region of storageExample: rec[1].fld[2]May involve 1, 2, 3 memory accesses 1 if rec and fld are both arrays 2 if either one is a pointer 3 if rec and fld are both pointers

Syntax (AST) is insufficient

CIL Lvalues

An exp referring to a region of storage

lval ::= <base offset>base ::= Var(varinfo) | Mem(exp)offset ::= None | Field(f offset) | Index(exp offset)

CIL Lvalues

Example: rec[1].fld[2] becomes either:<Var(rec), Index(1, Field(fld, Index(2, None)))> or:<Mem(2 + Lvalue(<Mem(1 + Lvalue(<Var(rec),

None>)), Field(fld, None)>), None>

Full static and operational semantics

Semantics

CIL gives syntax-directed semanticsExample judgment:

(x) = ` Var(x) (&x,)

environment

lvalue formmeaning

CIL Lvalue Semantics

(x) =

`Var(x) (&x,)

` e : Ptr()

`Mem(e) (e,)

` b (a,)

`None@b (a,)

` b (a1,Arr(1)) `o@(a1+e|1|,1) (a2,2)

`Index(e,o)@b (a2,2)

` o@b (a,)

`<b,o> (a,)

CIL Source Fidelity

CIL output:struct __anonstruct1 { int fld[3] ;}; typedef struct

__anonstruct1 * Myptr;Myptr rec;(rec + 2)->fld[1] = (int)’h’;

SUIF 2.2.0-4 output:typedef int __ar_1[3];struct type_1 { __ar_1 fld; };struct type_1 * rec;(((((int *)(((char *)&((((struct

type_1 *) (rec))))[2])+0U))))[1]) =(104);

typedef struct { int fld[3]; } * Myptr;Myptr rec;rec[2].fld[1] = ’h’;

Corner Cases

Your analysis will not have to handle: return ({goto L; p;}) && ({L: 5;}); return &(--x ? : z) - & (x++, x);

Full handling of GNU-isms, MSVC-isms attributes initializers

Corner Cases

Your analysis will not have to handle: return ({goto L; p;}) && ({L: 5;});

int tmp;

goto L;

if (p) { L: tmp = 1; }

else { tmp = 0; }

return tmp;

StackGuard Transform

Cowan et al., USENIX ’98Buffer overrun defense push return addess on private stack pop before returning only change functions with local arrays

40 lines of commented code with CILQuite easy: uses visitors for tree replacement, explicit returns, etc.

Other Transforms

Instrument and log all calls: 150 linesEliminate break, continue, switch: 1101 memory access per assignment: 100Make each function have a single return statement: 90Make all stack arrays heap-allocated: 75Log all value/addr memory writes: 45

Whole-Program Merger

C has incremental linking, compilation coupled with a weak module system!

Example (vortex / gcc / c++2c):

/* foo.c */

struct list { int head;

struct list * tail;

};

struct list * mylist;

/* bar.c */

struct chain { int head;

struct chain * tail;

};

extern struct chain * mylist;

Merging a Project

Determine what files to mergeMerge the files handle file-scoped identifiers C uses name equivalence for types but modules need structural equivalence

Key: Each global identifier has 1 type!

Other Merger Details

Remove duplicate declarations every file includes <stdio.h>

Match struct pointer with no defined body in file A to defined body in file B

Be careful when picking representatives

How Does it Work?

Make project, pass all files through CILRun your transform and analysisEmit simplified CCompile simplified C with GCC/MSVC… and it works!

Large Programs

Program #LOC *.[ch]

Notes

SPECINT95 360K

GIMP-1.2.2 800K large libraries

linux-2.4.5 2.5M 132% compile time

ACE (in C) 2M 2000 files

Used in the CCured and BLAST projects

Merged Kernel Stats

Stock monolithic Linux 2.4.5 kernelhttp://manju.cs.berkeley.edu/cil/vmlinux.cStatistics: Before | After 324 files | One 12.5MB file 11.3 M-words | 1.5 M-words 7.3 M-LOC (post-process) | 470 K-LOC$ make CC=“cil –merge” HOSTCC=“cil –merge” LD=“cil –merge” AR=“cil –mode=AR –merge”

Conclusion

CIL distills C to a precise, simple subset easy to analyze well-defined semantics close to the original source

Well-suited to complex analyses and source-to-source transformsParses ANSI/GCC/MSVC CRapidly merges large programs

Questions?

Try CIL out:

http://www.cs.berkeley.edu/~necula/cil

Complete source, documentation and test cases freely available