Oracle Technology Overview Theodoros Demosthenous Principal Consultant [email protected].
Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Joint work with Michael...
-
Upload
ryder-jonson -
Category
Documents
-
view
216 -
download
1
Transcript of Flexible Hardware Acceleration for Instruction-Grain Program Monitoring Joint work with Michael...
Flexible Hardware Acceleration for Instruction-Grain Program Monitoring
Joint work with
Michael Kozuch1, Theodoros Strigkos2, Babak Falsafi3, Phillip B. Gibbons1, Todd C. Mowry1,2, Vijaya Ramachandran4,
Olatunji Ruwase2, Michael Ryan1, Evangelos Vlachos2
Shimin Chen
1Intel Research Pittsburgh 2CMU 3EPFL 4UT Austin
Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 2
Instruction-Grain Monitoring
• Software often contain bugs– Memory corruptions, data races, …, crashes– Security attacks often designed to exploit bugs
• Instruction-grain lifeguards can help– Dynamic monitoring: during application execution– Instruction-grain: e.g., memory access, data flow
• Enables a wide range of powerful lifeguards
Application Lifeguard
Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 3
Example Instruction-Grain Lifeguards
• AddrCheck: – Monitor malloc/free, memory accesses– Check if all memory accesses visit allocated memory regions
• MemCheck: AddrCheck + check uninitialized values– Copying partially uninitialized structures is not an error– Lazy error detection to avoid many false positives – Track propagation of uninitialized values
• TaintCheck: detect overwrite-based security exploits– Tainted data: data from network or disk– Track propagation of tainted data to detect violations
• LockSet: detect data races in parallel programs
[Nethercote’04]
[Nethercote & Seward ’03 ’07]
[Savage et al.’97]
[Newsome & Song’05]
Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 4
Design Space of Support Platform
Specific Lifeguard General Purpose: Wide Range of Lifeguards
Dynamic binary instrumentation (DBI)10-100X slowdowns
General-Purpose HW improving DBI
3-8X slowdowns
Lifeguard-specific hardware
This paper
Perf
orm
an
ce
Good
Poor
[Bruening’04] [Luk et al’05]
[Nethercote’04]
[Crandall & Chong’04], [Dalton et al’07], [Shetty et al’06], [Shi et al’06], [Suh et al’04], [Venkataramani’07], [Venkataramani’08], [Zhou et al’07]
[Chen et al’06] [Corliss’03]
Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 5
Outline
• Introduction
• Background
• Three Hardware Acceleration Techniques
• Experimental Evaluation
• Conclusion
Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 6
Application TaintCheck Lifeguard
if (taint(F)==1) error;
Example Lifeguard: TaintCheck
Purpose: detect overwrite-based security exploits– Metadata kept for application memory and registers– Tainted data: data from network or disk– Track taint propagation– Detect violation: e.g., tainted jump target address
mov %eax Amov B %eax
add %ebx D
jmp *(F)
taint(%eax) = taint(A)taint(B) = taint(%eax)
taint(%ebx)|= taint(D)
[Newsome & Song’05]
Detect exploit before attack
code takes control
Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 7
TaintCheck w/ Detailed Tracking
TaintCheck:– Detect violation– 1 taint bit / application byte
TaintCheck w/ detailed tracking:– Construct taint propagation trail– More detailed metadata per application location
• PC of Instruction that tainted this location• “tainted from” address
• Not supported by previous lifeguard-specific HW
Input
Violation
[Newsome & Song’05]
Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 8
Instruction-Grain Lifeguard Metadata Characteristics
• Organization varies– per application byte/word– size, format, semantics vary greatly
• Frequently updated– e.g., propagation tracking
• Frequently checked– e.g., memory accesses
Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 9
Lifeguard Support
rare eventsRare
Update
Check
metadata
Event-capture and delivery
Application (unmodified)
Lifeguard (software)Event Handlers
Rare e.g., malloc/free, system calls
Frequent e.g., memory access,data movement
Events
General-Purpose HW improving DBI
Performance bottlenecks: metadata mapping, updates, and checks
1
2
3
Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 10
Our Contributions
rare eventsRare
Update
Check
metadata
Event-capture and delivery
Application (unmodified)
Lifeguard (software)Event Handlers
Rare e.g., malloc/free, system calls
Frequent e.g., memory access,data movement
Events
M-TLB
IF
IT
• Metadata-TLB for metadata mapping
• Inheritance Tracking for metadata updates
• Idempotent Filters for metadata checks
Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 11
Outline
• Introduction
• Background
• Three Hardware Acceleration Techniques– Metadata-TLB– Inheritance Tracking– Idempotent Filters
• Experimental Evaluation
• Conclusion
Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 12
Metadata-TLB: Motivation
• Metadata per app byte/word– Element size may vary
• Two-level structure:– Robustness & space efficiency
• Mapping: application address metadata address– Frequently used in almost every handler– Can be very costly
metadata
Level-1index
Level-2 chunks
Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 13
Example (TaintCheck)
map *mp = level1_index[src_addr>>16]; mov %eax, %ecx shr $16, %ecx mov level1_index(,%ecx,4),%ecx int idx = (src_addr & 0xffff)>>2; and $0xffff, %eax shr $2, %eax UChar mem_taint = mp[idx]; movzbl (%ecx,%eax,1), %eax reg_taint[dest_reg] |= mem_taint; or %al, reg_taint(%edx)
nlba (); nlba
void dest_reg_op_mem_4B (UINT32 src_addr /*%eax*/, UINT32 dest_reg /*%edx */) // app instruction type: dest_reg dest_reg op mem(src_addr) // handler operation: reg_taint(dest_reg)|= mem_taint(src_addr)
Metadata Mapping takes 5 out of 8
instructions !
Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 14
Our Solution: Metadata-TLB
• A TLB-like HW associative lookup table
• LMA (Load Metadata Address) instruction:– Application address lifeguard metadata address
• Managed by (user-mode) lifeguard software
Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 15
Example (TaintCheck) w/ M-TLB
map *mp = level1_index[src_addr>>16]; mov %eax, %ecx shr $16, %ecx mov level1_index(,%ecx,4),%ecx int idx = (src_addr & 0xffff)>>2; and $0xffff, %eax shr $2, %eax UChar mem_taint = mp[idx]; movzbl (%ecx,%eax,1), %eax reg_taint[dest_reg] |= mem_taint; or %al, reg_taint(%edx)
nlba (); nlba
void dest_reg_op_mem_4B (UINT32 src_addr /*%eax*/, UINT32 dest_reg /*%edx */) // app instruction type: dest_reg dest_reg op mem(src_addr) // handler operation: reg_taint(dest_reg)|= mem_taint(src_addr)
UChar *p = LMA_macro(src_addr); LMA %eax, %ecx
UChar mem_taint = *p; mov (%ecx), %al reg_taint[dest_reg] |= mem_taint; or %al, reg_taint(%edx) nlba (); nlba
Reduce handler size by half !
Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 16
Inheritance Tracking: Motivation
• Propagation tracking is expensive– Metadata updates for almost every app instruction
• Previous hardware solutions track propagation– automatically update metadata in hardware– Problem: only support simple metadata semantics
• e.g., do not support TaintCheck w/ detailed tracking
• Our goal: flexibility AND performance
• Idea: inheritance structure is common, so let’s track inheritance in hardware!
I nput
Violation
I nputI nput
ViolationViolation
Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 17
Problem with General Inheritance Tracking
Problem: state explosion for binary operations !
mov %eax Amov B %eax
taint(%eax) = taint(A)taint(B) = taint(%eax)
Application Propagation Tracking
%eax inherits from AB inherits from %eax
Inheritance Tracking
add %ebx D taint(%ebx) |= taint(D) insert D into %ebx’s inherit-from list
Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 18
Unary Inheritance Tracking
• Many lifeguards can take advantage of unary IT:– MemCheck– TaintCheck
• Large performance improvements if used– Can be disabled if unary IT does not match the lifeguard
check
check
known
Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 19
Tracking Register Inheritance
Original event
IT table for registers
State Transition& Event to
DeliverDeliver eventIT(%rs) IT(%rd)
Transformed event
More details in the paper:
• IT table and state transition table details
• Conflict detection
Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 20
Example
mem_to_regreg_to_mem
Application Before
mem_to_mem
Inheritance Tracking
mem_to_regdest_reg_op_memreg_to_mem
imm_to_mem
Can significantly reduce metadata update events!
mov %eax Amov B %eax
mov %ebx Cadd %ebx Dmov E %ebx
Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 21
Idempotent Filters: Idea
• Typically, metadata checks give the same result if– Event parameters are the same and– Metadata are the same
• Idea: filter out idempotent (redundant) events
• For example:– AddrCheck:
• After checking that a memory location is allocated• Subsequent loads/stores to the same location are safe• Until the next free() event
– LockSet: (surprisingly)• In between synchronization events (e.g., lock/unlock)• Check first load to a location• Check first store to a location
Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 22
Outline
• Introduction
• Background
• Three Hardware Acceleration Techniques
• Experimental Evaluation– Log-Based Architectures (LBA)– Simulation Study (w/ reduced input sets)– PIN-based Analysis (w/ full inputs)
• Conclusion
Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 23
Log-Based Architectures
rare eventsRare
Update
Check
metadata
Event-capture and delivery
Application (unmodified)
Lifeguard (software)Event Handlers
Rare e.g., malloc/free, system calls
Frequent e.g., memory access,data movement
Events
Log-Based Architecture (LBA)
Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 24
P P P
P P P P
P P P P
P P P P
P
Idea: Exploiting Chip Multiprocessors
LBA components
Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 25
Simulation Setup: Dual-Core LBA System
Log Transport(e.g. L2 cache)
Core 1 Core 2
decompress
Compress
capture dispatch
Operating System: Fedora Core 5
Application Lifeguard
IT & IF
M-TLB
• Application and lifeguard are processes• Application is stalled when log buffer is full• Model a 2-level cache hierarchy
Extend Virtutech
Simics
Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 26
Overall Performance: TaintCheck
0.00.51.01.52.02.53.03.54.04.55.0
bz
ip2
cra
fty
eo
n
ga
p
gc
c
gz
ip
mc
f
pa
rse
r
two
lf
vo
rte
x
vp
r
Av
g
slo
wd
ow
ns
1.36X
LBA baseline LBA optimized
Slowdown =application execution time w/o lifeguard
application execution time w/ lifeguard
Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 27
Applying Our Techniques One by One
AddrCheck TaintCheckTaintCheck w/ detailed
tracking
LockSetMemCheck
3.23
1.901.02
7.80
6.05
3.813.27 3.36
2.291.36
4.21
2.71
1.51
4.253.20
1.40
0.01.02.03.04.05.06.07.08.09.0
10.0B
AS
E
MT
LB
MT
LB+
IF
BA
SE
MT
LB
MT
LB+
IT
MT
LB+
IT+
IF
BA
SE
MT
LB
MT
LB+
IT
BA
SE
MT
LB
MT
LB+
IT
BA
SE
MT
LB
MT
LB+
IF
ave
rage
slo
wd
owns
• IT, IF, and M-TLB are indeed complementary
• Achieve dramatically better performance
Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 28
PIN-Based Analysis: IT
0102030405060708090
100
bzi
p2
cra
fty
eo
n
ga
p
gcc
gzi
p
mcf
pa
rse
r
two
lf
vort
ex
vpr
red
uc
ed
up
da
te e
ve
nts
(%
)
• IT removes 35.8% to 82.0% of the propagation events
Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 29
PIN-Based Analysis: IF
0
10
20
30
40
50
60
70
80
8 16 32 64 128 256number of filter entries
redu
ced
chec
k ev
ents
(%
)
0
10
20
30
40
50
60
70
80
8 16 32 64 128 256number of filter entries
redu
ced
chec
k ev
ents
(%
)
fully-assoc16-way8-way4-way2-way1-way
AddrCheck LockSet
• IF can effectively reduce check events
• 4-way works as well as fully-associative
Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 30
Conclusion
• Our focus: Instruction-Grain Lifeguards
• Three complementary hardware techniques:– Metadata-TLB (M-TLB)– Inheritance Tracking (IT)– Idempotent Filters (IF)
• Flexible to support a wide range of lifeguards– Reducing overheads by 2-3X in our experiments– Achieving 2-51% overheads for all but MemCheck
Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 31
Thank you!
Flexible Hardware Acceleration for Instruction-Grain Program MonitoringShimin Chen 32
People Working on LBA ProjectIntel Research:• Shimin Chen• Phillip B. Gibbons
University Faculty:• Babak Falsafi (EPFL)• Todd C. Mowry (CMU)
CMU Students:• Michelle Goodstein• Olatunji Ruwase
Previous Contributors:• Limor Fix (IRP)• Steve Schlosser (IRP)• Anastasia Ailamaki (CMU)• Greg Ganger (CMU)
• Bin Lin (Northwestern)• Radu Teodorescu (UIUC)
• Theodoros Strigkos• Evangelos Vlachos
• Vijaya Ramachandran (UT Austin)
• Mike Kozuch• Michael Ryan