CIL: Infrastructure for C Program Analysis and Transformation
-
Upload
alegria-martinez -
Category
Documents
-
view
21 -
download
1
description
Transcript of CIL: Infrastructure for C Program Analysis and Transformation
![Page 1: CIL: Infrastructure for C Program Analysis and Transformation](https://reader030.fdocuments.net/reader030/viewer/2022032708/56812c34550346895d90bae6/html5/thumbnails/1.jpg)
CIL: Infrastructure for C Program Analysis and Transformation
George C. Necula, Scott McPeak,S. P. Rahul, Westley Weimer
http://www.cs.berkeley.edu/~necula/cil
ETAPS – CC ’02 Friday, April 12
![Page 2: CIL: Infrastructure for C Program Analysis and Transformation](https://reader030.fdocuments.net/reader030/viewer/2022032708/56812c34550346895d90bae6/html5/thumbnails/2.jpg)
What is CIL?
Distills C language into a few key forms with precise semantics
Parser + IR + Program Merger for CMaintains types, close ties to sourceHighly structured, clean subset of CHandles ANSI/GCC/MSVC
![Page 3: CIL: Infrastructure for C Program Analysis and Transformation](https://reader030.fdocuments.net/reader030/viewer/2022032708/56812c34550346895d90bae6/html5/thumbnails/3.jpg)
Why CIL?
Analyses and TransformationsEasy to use impersonates compiler & linker $ make project CC=cil
Easy to work with converts away tricky syntax leaves just the heart of the language separates concepts
![Page 4: CIL: Infrastructure for C Program Analysis and Transformation](https://reader030.fdocuments.net/reader030/viewer/2022032708/56812c34550346895d90bae6/html5/thumbnails/4.jpg)
C Feature Separation
CIL separates language components pure expressions statements with side-effects control-flow embedded CFG
Keeps all programmer names temps serialize side-effects simplified scoping
![Page 5: CIL: Infrastructure for C Program Analysis and Transformation](https://reader030.fdocuments.net/reader030/viewer/2022032708/56812c34550346895d90bae6/html5/thumbnails/5.jpg)
Example: C Lvalues
An exp referring to a region of storageExample: rec[1].fld[2]May involve 1, 2, 3 memory accesses 1 if rec and fld are both arrays 2 if either one is a pointer 3 if rec and fld are both pointers
Syntax (AST) is insufficient
![Page 6: CIL: Infrastructure for C Program Analysis and Transformation](https://reader030.fdocuments.net/reader030/viewer/2022032708/56812c34550346895d90bae6/html5/thumbnails/6.jpg)
CIL Lvalues
An exp referring to a region of storage
lval ::= <base offset>base ::= Var(varinfo) | Mem(exp)offset ::= None | Field(f offset) | Index(exp offset)
![Page 7: CIL: Infrastructure for C Program Analysis and Transformation](https://reader030.fdocuments.net/reader030/viewer/2022032708/56812c34550346895d90bae6/html5/thumbnails/7.jpg)
CIL Lvalues
Example: rec[1].fld[2] becomes either:<Var(rec), Index(1, Field(fld, Index(2, None)))> or:<Mem(2 + Lvalue(<Mem(1 + Lvalue(<Var(rec),
None>)), Field(fld, None)>), None>
Full static and operational semantics
![Page 8: CIL: Infrastructure for C Program Analysis and Transformation](https://reader030.fdocuments.net/reader030/viewer/2022032708/56812c34550346895d90bae6/html5/thumbnails/8.jpg)
Semantics
CIL gives syntax-directed semanticsExample judgment:
(x) = ` Var(x) (&x,)
environment
lvalue formmeaning
![Page 9: CIL: Infrastructure for C Program Analysis and Transformation](https://reader030.fdocuments.net/reader030/viewer/2022032708/56812c34550346895d90bae6/html5/thumbnails/9.jpg)
CIL Lvalue Semantics
(x) =
`Var(x) (&x,)
` e : Ptr()
`Mem(e) (e,)
` b (a,)
`None@b (a,)
` b (a1,Arr(1)) `o@(a1+e|1|,1) (a2,2)
`Index(e,o)@b (a2,2)
` o@b (a,)
`<b,o> (a,)
![Page 10: CIL: Infrastructure for C Program Analysis and Transformation](https://reader030.fdocuments.net/reader030/viewer/2022032708/56812c34550346895d90bae6/html5/thumbnails/10.jpg)
CIL Source Fidelity
CIL output:struct __anonstruct1 { int fld[3] ;}; typedef struct
__anonstruct1 * Myptr;Myptr rec;(rec + 2)->fld[1] = (int)’h’;
SUIF 2.2.0-4 output:typedef int __ar_1[3];struct type_1 { __ar_1 fld; };struct type_1 * rec;(((((int *)(((char *)&((((struct
type_1 *) (rec))))[2])+0U))))[1]) =(104);
typedef struct { int fld[3]; } * Myptr;Myptr rec;rec[2].fld[1] = ’h’;
![Page 11: CIL: Infrastructure for C Program Analysis and Transformation](https://reader030.fdocuments.net/reader030/viewer/2022032708/56812c34550346895d90bae6/html5/thumbnails/11.jpg)
Corner Cases
Your analysis will not have to handle: return ({goto L; p;}) && ({L: 5;}); return &(--x ? : z) - & (x++, x);
Full handling of GNU-isms, MSVC-isms attributes initializers
![Page 12: CIL: Infrastructure for C Program Analysis and Transformation](https://reader030.fdocuments.net/reader030/viewer/2022032708/56812c34550346895d90bae6/html5/thumbnails/12.jpg)
Corner Cases
Your analysis will not have to handle: return ({goto L; p;}) && ({L: 5;});
int tmp;
goto L;
if (p) { L: tmp = 1; }
else { tmp = 0; }
return tmp;
![Page 13: CIL: Infrastructure for C Program Analysis and Transformation](https://reader030.fdocuments.net/reader030/viewer/2022032708/56812c34550346895d90bae6/html5/thumbnails/13.jpg)
StackGuard Transform
Cowan et al., USENIX ’98Buffer overrun defense push return addess on private stack pop before returning only change functions with local arrays
40 lines of commented code with CILQuite easy: uses visitors for tree replacement, explicit returns, etc.
![Page 14: CIL: Infrastructure for C Program Analysis and Transformation](https://reader030.fdocuments.net/reader030/viewer/2022032708/56812c34550346895d90bae6/html5/thumbnails/14.jpg)
Other Transforms
Instrument and log all calls: 150 linesEliminate break, continue, switch: 1101 memory access per assignment: 100Make each function have a single return statement: 90Make all stack arrays heap-allocated: 75Log all value/addr memory writes: 45
![Page 15: CIL: Infrastructure for C Program Analysis and Transformation](https://reader030.fdocuments.net/reader030/viewer/2022032708/56812c34550346895d90bae6/html5/thumbnails/15.jpg)
Whole-Program Merger
C has incremental linking, compilation coupled with a weak module system!
Example (vortex / gcc / c++2c):
/* foo.c */
struct list { int head;
struct list * tail;
};
struct list * mylist;
/* bar.c */
struct chain { int head;
struct chain * tail;
};
extern struct chain * mylist;
![Page 16: CIL: Infrastructure for C Program Analysis and Transformation](https://reader030.fdocuments.net/reader030/viewer/2022032708/56812c34550346895d90bae6/html5/thumbnails/16.jpg)
Merging a Project
Determine what files to mergeMerge the files handle file-scoped identifiers C uses name equivalence for types but modules need structural equivalence
Key: Each global identifier has 1 type!
![Page 17: CIL: Infrastructure for C Program Analysis and Transformation](https://reader030.fdocuments.net/reader030/viewer/2022032708/56812c34550346895d90bae6/html5/thumbnails/17.jpg)
Other Merger Details
Remove duplicate declarations every file includes <stdio.h>
Match struct pointer with no defined body in file A to defined body in file B
Be careful when picking representatives
![Page 18: CIL: Infrastructure for C Program Analysis and Transformation](https://reader030.fdocuments.net/reader030/viewer/2022032708/56812c34550346895d90bae6/html5/thumbnails/18.jpg)
How Does it Work?
Make project, pass all files through CILRun your transform and analysisEmit simplified CCompile simplified C with GCC/MSVC… and it works!
![Page 19: CIL: Infrastructure for C Program Analysis and Transformation](https://reader030.fdocuments.net/reader030/viewer/2022032708/56812c34550346895d90bae6/html5/thumbnails/19.jpg)
Large Programs
Program #LOC *.[ch]
Notes
SPECINT95 360K
GIMP-1.2.2 800K large libraries
linux-2.4.5 2.5M 132% compile time
ACE (in C) 2M 2000 files
Used in the CCured and BLAST projects
![Page 20: CIL: Infrastructure for C Program Analysis and Transformation](https://reader030.fdocuments.net/reader030/viewer/2022032708/56812c34550346895d90bae6/html5/thumbnails/20.jpg)
Merged Kernel Stats
Stock monolithic Linux 2.4.5 kernelhttp://manju.cs.berkeley.edu/cil/vmlinux.cStatistics: Before | After 324 files | One 12.5MB file 11.3 M-words | 1.5 M-words 7.3 M-LOC (post-process) | 470 K-LOC$ make CC=“cil –merge” HOSTCC=“cil –merge” LD=“cil –merge” AR=“cil –mode=AR –merge”
![Page 21: CIL: Infrastructure for C Program Analysis and Transformation](https://reader030.fdocuments.net/reader030/viewer/2022032708/56812c34550346895d90bae6/html5/thumbnails/21.jpg)
Conclusion
CIL distills C to a precise, simple subset easy to analyze well-defined semantics close to the original source
Well-suited to complex analyses and source-to-source transformsParses ANSI/GCC/MSVC CRapidly merges large programs
![Page 22: CIL: Infrastructure for C Program Analysis and Transformation](https://reader030.fdocuments.net/reader030/viewer/2022032708/56812c34550346895d90bae6/html5/thumbnails/22.jpg)
Questions?
Try CIL out:
http://www.cs.berkeley.edu/~necula/cil
Complete source, documentation and test cases freely available