Pin2 Tutorial1 Pin Tutorial Kim Hazelwood Robert Muth VSSAD Group, Intel.
-
Upload
willa-blankenship -
Category
Documents
-
view
227 -
download
4
Transcript of Pin2 Tutorial1 Pin Tutorial Kim Hazelwood Robert Muth VSSAD Group, Intel.
Pin2 Tutorial 1
Pin TutorialPin Tutorial
Kim HazelwoodRobert Muth
VSSAD Group, Intel
Pin2 Tutorial 2
Pin People
Robert Cohn
Kim Hazelwood
Artur Klauser
Geoff Lowney
CK Luk
Robert Muth
Harish Patil
Ramesh Peri
Vijay Janapareddi
Steven Wallace
Pin2 Tutorial 3
Outline
Pin Overview
Instrumentation Basics
Advanced Topics
Pin2 Tutorial 4
What is Pin?
• Pin Is Not a TLA
• Pin is a dynamic binary rewriting engine
• Derived from Spike: a static rewriter
• Two versions available:– Pin2 is the current version– Pin0 (IPF only) is not covered in this talk
Pin2 Tutorial 5
Pin Features
• Rewritten program exists only in memory • No tool chain dependence
– No issues with code/data mixing, missing relocs, etc.
• Rewrites all user level code including shared libs• Multi-ISA: Itanium, IA32, EM64T, XScale• Attach/detach to/from running process (like gdb)• Transparent: unchanged program behavior • Efficient: very good performance
Pin2 Tutorial 6
Pin Applications
• Optimization
• Security (program shepherding)
• Debugging
• Instrumentation
Instrumentation is our current focus
Pin2 Tutorial 7
Uses for Instrumentation
• Profiling for optimization– Basic block counts, edge counts– Value profiles, stride profiling, load latencies
• Micro-architectural studies– Branch predictor simulation– Cache simulation– Trace generation
• Bug checking– Find uninitialized or unallocated data references
Pin2 Tutorial 8
Pin Instrumentation Features
• User programmable via plug-ins– many examples provided– plug-ins are typically ISA agnostic
• Can take advantage of symtab info
• Automatic register saving/restoring
• Various instrumentation granularities– Instruction, “Trace”, Routine
• ATOM compatibility mode (AOTI)
Pin2 Tutorial 9
Other Dynamic Rewriting Engines(and what they focus on)
• Dynamo (PA-RISC HPUX)– Dynamic optimization
• DynamoRIO (IA32 Linux + Win32)– Originally: Dynamic optimization– Now: Sandboxing, some instrumentation
• Valgrind (IA32 Linux)– Originally: Special-purpose instrumentation– Now: General-purpose instrumentation
Pin2 Tutorial 10
Static Instrumentation(“Atom Style”)
• (Way) Ahead-of-time • Persistent• Good but not perfect transparency• Shared libraries can be a problem
ProgramInstrumented ProgramATOM
Pin2 Tutorial 11
Dynamic Instrumentation(“Pin Style”)
• Execution driven– Occurs when code is executed
• Original program is NOT modified– Code is “copied” into code cache– Only code in code cache is executed
• Instrumentation is not persistent
• Can also instrument libraries
Pin2 Tutorial 12
Dynamic Instrumentation
2 3
1
7
4 5
6
Pin
Originalcode
Codecache
Pin has grabbed control before execution of block 1
Pin2 Tutorial 13
Dynamic Instrumentation
2 3
1
7
4 5
67’
2’
1’
Pin
Originalcode
Codecache
Pin fetches trace and allows for instrumentation
Pin2 Tutorial 14
Dynamic Instrumentation
2 3
1
7
4 5
67’
2’
1’
Pin
Originalcode
Codecache
Pin transfers control intocode cache (block 1)
Pin2 Tutorial 15
Dynamic Instrumentation
2 3
1
7
4 5
67’
2’
1’
Pin
Originalcode
Codecache 3’
5’
6’
Pin fetches new traceand ‘links’ it
Pin2 Tutorial 16
Dynamic Instrumentation
2 3
1
7
4 5
67’
2’
1’
Pin
Originalcode
Codecache
3’
5’
6’
Pin transfers control intocode cache (block 3)
Pin2 Tutorial 17
Running Pin
• Three program images are involved:1.pin 2.pintool/plug-in 3.Application
• “Shell mode” $ pin –t inscount –- xclock
• “Gdb mode” - attaching to existing process $ pin –pid 1067 –t inscount (can detach and re-attach with different plug-in)
Pin2 Tutorial 18
Transparency
Program execution under Pin is transparent:
• Program state is unchanged– Code/data addresses, memory content
• Will not expose latent bugs
• Instrumentation sees the original program– Code/data address, memory content
• (But: intentional program state changes possible, e.g. fault injection)
Pin2 Tutorial 19
Transparency (Example)
Push 0x1006 on stack, then jump to 0x4000
Original Code:0x1000 call 0x4000
Code cache address mapping:0x1000 ->0x7000 “caller”0x4000 -> 0x8000 “callee”
Translated Code:0x7000 Push 0x10060x7006 Jmp 0x8000
Stack content remains unchanged
Pin2 Tutorial 20
Transparency has a Price
Pop 0x1006 from stack, then jump to 0x1006
Original Code:0x4400 ret
Translated Code:0x8400 Pop rx0x84… ry = Translate(rx)
0x84… Jmp ry
• Pin needs to translate program address to code cache address.• Main reason for slowdowns in dynamic instrumentation systems!
Pin2 Tutorial 21
Portability ChallengesARM IA-32/EM64T IPF
Type RISC CISC VLIW
Instruction Fixed length Variable length, prefixes
Bundled
Memory Instruction
LD/ST Any, Implicit LD/ST
Memory op size Fixed Variable length Fixed
Addressing modes Pre/post/iprel increment
Index/offset/
scale/iprel
post
Predication Cond. codes None Predicate regs
Parameters Registers Stack/registers Stacked registers
Pin2 Tutorial 22
Pin Instrumentation Query API
• ISA independent part (usually sufficient)– INS_Address(), INS_Size(), INS_IsRet(),
INS_IsCall(), INS_MemoryReadSize(), INS_Mnemonic(), etc.
• ISA dependent part (optional)– INS_GetPredicate(), INS_RegR(),
INS_RegW(), etc.
Pin2 Tutorial 23
Performance Comparison:No Instrumentation
433
747
587
1188
739
168
414
847
302
568
855
623
105 21
3
138
178
109
101
121 20
3
113
173
105
142
108 18
2 299
111
101
115 23
7
114 19
8
109
154
122
0
200
400
600
800
1000
1200
No
rma
lize
d E
xe
cu
tio
n T
ime
(%
) Valgrind DynamoRIO Pin/IA32
latest numbers are even better
Pin2 Tutorial 24
Performance Comparison: Basic-Block Counting
582
1091
860
1583
934
191
574
1220
391
817 93
6
834
479 61
7
606
633 71
8
158
480
793
269 52
0
320 50
8
240 37
7
365 47
8
212
119 20
0
517
147 31
8
177 28
6
0200400600800
1000120014001600
No
rmal
ized
Exe
cuti
on T
ime
(%)
Valgrind DynamoRIO Pin/IA32
latest numbers are even better
Pin2 Tutorial 25
Pin2 Status
• ISAs: IA32, IA32E, Xscale, (IPF soon)
• Distros: Debian, Suse, Mandrake,Red Hat 7.2, 8.0, 9.0, EL3, FC3
• >2500 downloads
• Multithreading support in beta
• Windows support in preparation
Pin2 Tutorial 26
Project Engineering
• Automatic nightly testing– >4 platforms– >7 Linux distributions– >8 compilers– >9000 binaries
• Automatically generated user manual, internal documentation using Doxygen
Pin2 Tutorial 27
Outline
Pin Overview
Instrumentation Basics
Advanced Topics
Pin2 Tutorial 28
Instrumentation vs. Analysis
Concepts borrowed from ATOM• Instrumentation routines define where
instrumentation is inserted– e.g. before instruction
Occurs at compile time (JIT time)
• Analysis routines define what to do when instrumentation is activated– e.g. increment counter
Occurs at runtime
Pin2 Tutorial 29
Instrumentation vs. Analysis (2)
In ATOM:• Instrumentation and analysis occurred in separate phase• Code was in separate files
In Pin:• Difference is somewhat blurred• Instrumentation and analysis are interleaved• User plug-in provides code for both
These are difficult terms to remember!Mental Bridge: Instrumentation → Insertion
Analysis → Action
Pin2 Tutorial 30
Instrumentation Routine
• Written in C++• Invoked by Pin via Callback mechanism• Invoked when Pin places new code in
code cache (different granularities: instruction, trace, …)
• Instruments using the Pin API for– inserting calls to analysis routines– picking arguments for analysis routines
Pin2 Tutorial 31
Analysis Routines
• Written in any language: C, C++, Asm, etc.• Invoked when surrounding code executes • Isolated from application by
– separate memory areas– separate register state
• Automatically optimized by Pin (inlining, register allocation, etc.)
Pin2 Tutorial 32
Example: Instruction Count
mov r2 = 2
add r3 = 4, r3
beq L1
add r4 = 8, r4
beq L2
IncCounter();
IncCounter();
IncCounter();
IncCounter();
IncCounter();
Instrumentation:Insert call to IncCounter()before every instruction
Analysis:
VOID IncCounter() { icount++;}
Pin2 Tutorial 33
$ /bin/ls Makefile atrace.o imageload.out
$ pin -t inscount -- /bin/ls Makefile atrace.o imageload.out
Count 422838
$
Example: Instruction Count
Output of inscount plug-in
Pin2 Tutorial 34
#include <iostream> #include "pin.H"
UINT64 icount = 0;
VOID IncCounter() { icount++;} VOID Instruction(INS ins, VOID *v) { INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)IncCounter, IARG_END);}
VOID Fini(INT32 code, VOID *v) { std::cerr << "Count " << icount << endl;}
int main(int argc, char * argv[]) { PIN_Init(argc, argv); INS_AddInstrumentFunction(Instruction, 0); PIN_AddFiniFunction(Fini, 0); PIN_StartProgram(); return 0;}
inscount.C
analysis
instrumentation
driver
ISA independent!
1
2
3
Pin2 Tutorial 35
Explanations
1. Register Instruction() to be called back for every instruction placed into the code cache
2. Insert call to IncCount() before code cache instruction
3. Register Fini() to be called back at the end
Pin2 Tutorial 36
2
Instrumentation Points
L2: mov r9 = 4
retbeq L2
Relative to an instruction (“beq L2”):1. Before (IPOINT_BEFORE)2. After (IPOINT_AFTER)3. On taken branch (IPOINT_BRANCH_TAKEN)
1
mov r4 = 2
add r3=8,r9
3
Pin2 Tutorial 37
Example: Instruction Trace
mov r2 = 2
add r3 = 4, r3
beq L1
add r4 = 8, r4
beq L2
traceInst(ip);
traceInst(ip);
traceInst(ip);
traceInst(ip);
traceInst(ip);
Pin2 Tutorial 38
Example: Instruction Trace
$ pin -t itrace -- /bin/ls Makefile atrace.o imageload.out
$ head itrace.out 0x40001e90 0x40001e91 0x40001ee4 0x40001ee5 0x40001ee7 0x40001ee8 …$
Pin2 Tutorial 39
#include <stdio.h>#include "pin.H"
FILE * trace;
VOID traceInst(VOID *ip) { fprintf(trace, "%p\n", ip);}
VOID Instruction(INS ins, VOID *v) { INS_InsertCall(ins, IPOINT_BEFORE,
(AFUNPTR)traceInst, IARG_INST_PTR, IARG_END);}
int main(int argc, char * argv[]) { trace = fopen("itrace.out", "w"); PIN_Init(argc, argv); INS_AddInstrumentFunction(Instruction, 0); PIN_StartProgram(); return 0;}
itrace.C
1
Pin2 Tutorial 40
Explanations
1. Insert traceIns() before code cache instruction, traceIns() takes extra argument!
(Bad coding practice: we should have closed the file descriptor using a Fini function)
Pin2 Tutorial 41
Analysis Routine Parameters
• IARG_UINT32 <number>• IARG_REG_VALUE <register name> [*]• IARG_INST_PTR• IARG_BRANCH_TAKEN• IARG_BRANCH_TARGET_ADDR• IARG_G_ARG0_CALLER• IARG_MEMORY_READ_EA• IARG_SYSCALL_NUMBER• …[*] Will result in ISA dependent tool
Pin2 Tutorial 42
BBL1
BBL2
Example: Fast Instruction Count
mov r2 = 2
add r3 = 4, r3
beq L1
add r4 = 8, r4
beq L2
IncCounter(1);
IncCounter(1);
IncCounter(1);
IncCounter(1);
IncCounter(1);
IncCounter(3);
IncCounter(2);
Pin2 Tutorial 43
#include <stdio.h> #include "pin.H“
UINT64 icount = 0;VOID IncCounter(INT32 c) { icount += c; }
VOID Trace(TRACE trace, VOID *v) { for(BBL b=TRACE_BblHead(trace); BBL_Valid(b); b=BBL_Next(b)){ BBL_InsertCall(b, IPOINT_BEFORE, (AFUNPTR)IncCounter,
IARG_UINT32, BBL_NumIns(b), IARG_END); }}
VOID Fini(INT32 code, VOID *v) { fprintf(stderr, "Count %lld\n", icount);}
int main(int argc, char * argv[]) { PIN_Init(argc, argv); TRACE_AddInstrumentFunction(Trace, 0); PIN_AddFiniFunction(Fini, 0); PIN_StartProgram(); return 0;}
inscount.C
1
2
Pin2 Tutorial 44
Explanations
1. Register Trace() to be called back for every trace placed in the code cache
As first approximation, a “trace” is sequence of basic blocks (BBLs)
2. For each trace walk the BBLs and insert IncCount() with appropriate integer parameter at beginning
Pin2 Tutorial 45
Further Reading
The following material is also covered in the Pin user manual
Go to
http://rogue.colorado.edu/Pin/
Then follow the “manuals” link
Pin2 Tutorial 46
Summary
• Pin instrumentation is:– Robust– Transparent– Easy-to-use– Efficient– Portable
• Try it: http://rogue.colorado.edu/Pin
Pin2 Tutorial 47
Outline
Pin Overview
Instrumentation Basics
Advanced Topics
Pin2 Tutorial 48
Trace vs. Instruction Instrumentation
VOID Instruction(INS ins, VOID *v) { INS_InsertCall(ins, IPOINT_BEFORE,(AFUNPTR)Cnt, IARG_END);}
Can be emulated by:
VOID Trace(TRACE trace, VOID *v) { for (BBL bbl = TRACE_BblHead(trace);
BBL_Valid(bbl); bbl = BBL_Next(bbl)) { for ( INS ins = BBL_InsHead(bbl); INS_Valid(ins); ins = INS_Next(ins)){
INS_InsertCall(ins,IPOINT_BEFORE,(AFUNPTR)Cnt,IARG_END);
} }}
Pin2 Tutorial 49
Definition: Pin Trace (JITI)
• List of instructions that is only entered from top, but may have multiple exits
• No side entries (Pin duplicates code to ensure this!)
• Multiple copies of instruction in code cache
Program:mov r2 = 2
L2: add r3 = 4, r3
add r4 = 8, r4
beq L2
…
Trace 1:mov r2 = 2
add r3 = 4, r3
add r4 = 8, r4
beq L2
…
Trace 2:
add r3 = 4, r3
add r4 = 8, r4
beq L2
…
Pin2 Tutorial 50
Instrumentation Modes
• Just-In-Time Instrumentation (JITI)– Per instruction, per trace– “basic block” notion
• Ahead-Of-Time Instrumentation (AOTI)– Per instruction, per function, per
section/image– Emulated using JITI– Functionality similar to ATOM – Extra startup overhead– No “basic blocks” notion
Pin2 Tutorial 51
Per Image Instrumentation (AOTI)Hooking Image (Un)Loading
$pin -t imageload -- /bin/ls Makefile imageload.o inscount0.o
$ cat imageload.out Loading /bin/ls Loading /lib/ld-linux.so.2 …Unloading /bin/ls Unloading /lib/ld-linux.so.2 …
Pin2 Tutorial 52
… FILE * T;
VOID ImageLoad(IMG img, VOID *v){ fprintf(T, "Loading %s\n", IMG_Name(img).c_str());}
VOID ImageUnload(IMG img, VOID *v){ fprintf(T, "Unloading %s\n", IMG_Name(img).c_str());}
VOID Fini(INT32 code, VOID *v) { fclose(T); }
int main(int argc, char * argv[]) { trace = fopen("imageload.out", "w"); PIN_Init(argc, argv); IMG_AddInstrumentFunction(ImageLoad, 0);
IMG_AddUnloadFunction(ImageUnload, 0); PIN_AddFiniFunction(Fini, 0); PIN_StartProgram(); return 0;
}
Pin2 Tutorial 53
“Walking” ImagesVOID ImageLoad(IMG img, VOID *v) { for (SEC sec = IMG_SecHead(img);
SEC_Valid(sec); sec = SEC_Next(sec)) { for (RTN rtn = SEC_RtnHead(sec);
RTN_Valid(rtn); rtn = RTN_Next(rtn)) {
RTN_Open(rtn); for (INS ins = RTN_InsHead(rtn);
INS_Valid(ins); ins = INS_Next(ins)) static_count++;
RTN_Close(rtn); } }}
Pin2 Tutorial 54
Explanations
• Image->Section->Routine->Instruction
• We are essentially walking the symtab
• For each functions symbol:– Disassemble function (RTN_Open)– Then walk instructions – NB: no basic blocks available!
Pin2 Tutorial 55
“Walking” And InstrumentingVOID ImageLoad(IMG img, VOID *v) { for (SEC sec = IMG_SecHead(img);
SEC_Valid(sec); sec = SEC_Next(sec)) { for (RTN rtn = SEC_RtnHead(sec);
RTN_Valid(rtn); rtn = RTN_Next(rtn)) { RTN_Open(rtn); for (INS ins = RTN_InsHead(rtn);
INS_Valid(ins); ins = INS_Next(ins)) {
INS_InsertCall(ins,IPOINT_BEFORE, (AFUNPTR)Cnt,IARG_END);
} RTN_Close(rtn);
} }}
Pin2 Tutorial 56
Explanations
• AOTI, instrumentation request are cached until code is executed
• Effect like 1st instruction count example • But:
– worse (startup) performance– higher memory consumption
• Requires symbol table
→ Bad use of AOTI!
Pin2 Tutorial 57
“Searching” And Instrumenting
VOID ImageLoad(IMG img, VOID *v) { RTN mallocRtn = RTN_FindByName(img, "malloc"); if (RTN_Valid(mallocRtn)) { RTN_Open(mallocRtn); RTN_InsertCall(mallocRtn, IPOINT_BEFORE, (AFUNPTR)MBefore, IARG_G_ARG0_CALLEE,
IARG_END); RTN_InsertCall(mallocRtn, IPOINT_AFTER, (AFUNPTR)MAfter, IARG_G_RESULT0, IARG_END); RTN_Close(mallocRtn); }}
SimpleExamples/malloctrace.C
Pin2 Tutorial 58
Explanations
• Instrument prolog and epilogs of malloc() using RTN_InsertCall
• Instrumentation really happens on instruction level, hence we must call RTN_Open
• Requires symbol table
• Good use of AOTI!
Pin2 Tutorial 59
Performance Considerations
VOID count( ADDRINT s, ADDRINT d ) {
COUNTER *pedg = Lookup( s,d ); // expensive!
pedg->_count++; }
VOID Instruction(INS ins, void *v) {
...
if ( [ins is a branch or a call instruction] ) INS_InsertCall(ins, IPOINT_BEFORE,
(AFUNPTR)count, IARG_INST_PTR,
IARG_BRANCH_TARGET_ADDR, IARG_END);
... }
Pin2 Tutorial 60
Improved Version
VOID count_fast( COUNTER *pedg ) { pedg->_count++;}
VOID InstructionFast(INS ins, void *v) {… if (INS_IsDirectBranchOrCall(ins)) { COUNTER *pedg = Lookup( INS_Address(ins),
INS_DirectBranchOrCallTargetAddress(ins) );
INS_InsertCall(ins, IPOINT_BEFORE,
(AFUNPTR) count_fast, IARG_ADDRINT, pedg,
IARG_END); } else { ... }
Pin2 Tutorial 61
Remarks
• If possible move work from analysis to instrumentation!
• Keep analysis routine small so that they get inlined!
Pin2 Tutorial 62
Plug-ins Shipped with Pin2
• Data cache simulation
• Malloc/Free tracer
• Syscall tracer
• Opcode mix profiler
• Register usage profiler
• …
Pin2 Tutorial 63
Debugging Pin Plug-ins
Pause Pin for 7 sec to attach with gdb
$ pin -pause_tool 7 -t inscount -- /bin/ls Pausing to attach to pid 28769$ gdb
(gdb) attach 28769 …(gdb) break main ...(gdb) cont
Pin2 Tutorial 64
Summary
• Pin instrumentation is:– Robust– Transparent– Easy-to-use– Efficient– Portable
• Try it: http://rogue.colorado.edu/Pin