Dynamic Binary Optimization – Part 1
description
Transcript of Dynamic Binary Optimization – Part 1
![Page 1: Dynamic Binary Optimization – Part 1](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681440b550346895db0a350/html5/thumbnails/1.jpg)
Dynamic Binary Optimization – Part 1
2006. 9.25
Nam, E Hyun
![Page 2: Dynamic Binary Optimization – Part 1](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681440b550346895db0a350/html5/thumbnails/2.jpg)
2
Contents
Overview Dynamic program Behavior Profiling Optimizing Translation blocks
![Page 3: Dynamic Binary Optimization – Part 1](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681440b550346895db0a350/html5/thumbnails/3.jpg)
3
Add1 %edx,4(%eax)Mov1 4(%eax),%edx
Addi r16,r4,4Lwzx r17,r2,r16Add r7,r17,r7Addi r16,r4,4Stwx r7,r2,r16
Addi r16,r4,4Lwzx r17,r2,r16Add r7,r17,r7Stwx r7,r2,r16
Overview : Optimization
Optimization Migration of VM consideration
from compatibility to performance Goal
To close the gap between a guest’ emulated performance and native platform performance
Type Translation block chaining Enlarging the translation block Reordering translated instructions Conventional complier
optimization techniques
![Page 4: Dynamic Binary Optimization – Part 1](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681440b550346895db0a350/html5/thumbnails/4.jpg)
4
Overview : Profile
Profile Statistics regarding a program’s behavior A guide for making optimization decision
Common optimization strategy is to use profiling to determine the path that are predominantly followed by control flow
Type of profile information Instructions( or Basic Blocks ), more heavily executed Sequence in which BB are most commonly executed Behavior of particular data variables and addresses
![Page 5: Dynamic Binary Optimization – Part 1](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681440b550346895db0a350/html5/thumbnails/5.jpg)
5
Overview : Profile
Advantage of profile information Providing information that may not have been available when a program
was originally compiled
BB A……R3 ß …R7 ß …R1 ß R2 + R3Br L1 if R3==0
BB B…R6 ß R1 + R6 ……
BB CL1: R1 ß 0
……
BB A……R3 ß …R7 ß …
Br L1 if R3==0
BB B…R6 ß R1 + R6 ……
BB CL1: R1 ß 0
……
BB A……R3 ß …R7 ß …
Br L1 if R3==0
BB B…R6 ß R1 + R6 ……
BB CL1: R1 ß 0
……
Compensation codeR1 ß R2 + R3
![Page 6: Dynamic Binary Optimization – Part 1](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681440b550346895db0a350/html5/thumbnails/6.jpg)
6
Overview : BB rearrangement
Definition Method, so that
predominant path has instructions in consecutive memory location
Advantages Nice localization Efficient instruction
fetching Type
Trace Superblock Tree group
BB A……R3 ß …R7 ß …R1 ß R2 + R3Br L1 if R3==0
BB B…R6 ß R1 + R6 ……
BB CL1: R1 ß 0
……
Superblock……R3 ß …R7 ß …Br L1 if R3!=0
L1: R1 ß 0……
BB B…R6 ß R1 + R6 ……
Compensation codeR1 ß R2 + R3
![Page 7: Dynamic Binary Optimization – Part 1](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681440b550346895db0a350/html5/thumbnails/7.jpg)
7
Overview : Staged emulation
Relation between emulation and optimization Tightly integrated with emulation Optimization is part of an emulation framework that support
staged emulation Staged emulation
Based on tradeoff between start-up time and steady state performance
Interpretation Binary translation Dynamic binary optimization
![Page 8: Dynamic Binary Optimization – Part 1](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681440b550346895db0a350/html5/thumbnails/8.jpg)
8
Overview : Staged emulation
Stages of staged emulation Interpretation BB translation( e.g. chaining ) Optimized translation( e.g. superblock ) Highly optimized translation
Interpreter
Binary memoryImage
BB cache Code cache Profile data
Translator Optimizer
Emulation manager
![Page 9: Dynamic Binary Optimization – Part 1](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681440b550346895db0a350/html5/thumbnails/9.jpg)
9
Overview : Spectrum of emulation
Interpret Basic translation Optimized blocksHighly optimized
blocks
Fast startup
Slow steady state
Simple profiling
Low overhead
Very slow startup
Fast steady state
Extensive profiling
High overhead
![Page 10: Dynamic Binary Optimization – Part 1](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681440b550346895db0a350/html5/thumbnails/10.jpg)
10
Overview : Staged emulation strategy
Strategy decision factors Source and target ISA Type of VM being implemented Design objective Tradeoff between Obtained optimization performance and
optimization, profiling overhead Example
Original HP Dynamo system, Digital FX!32 Interpret optimized, translated code
DynamoRIO Simple binary translation optimization
Shade Interpretation simple binary translation
![Page 11: Dynamic Binary Optimization – Part 1](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681440b550346895db0a350/html5/thumbnails/11.jpg)
11
Contents
Overview Dynamic program Behavior Profiling Optimizing Translation blocks
![Page 12: Dynamic Binary Optimization – Part 1](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681440b550346895db0a350/html5/thumbnails/12.jpg)
12
Dynamic program behavior
Goal Optimization depends on
program’s structure and dynamic behavior
By profiling, optimization system can learn about program’s structure and dynamic behavior
Important characteristics of program
High predictability of dynamic control flow
Correlation of branch direction, between current and most recent previous execution
0
10
20
30
40
50
0-10% 10-20% 20-30% 30-40% 40-50% 50-60% 60-70% 70-80% 80-90% >90%
Percent taken
Frac
tion
of st
atic
con
dition
al b
ranc
hes
0
10
20
30
40
50
60
70
80
90
100
176.g
cc
181.m
cf
197.p
arse
r
252.e
on
256.b
zip2
171.s
wim
173.a
pplu
177.m
esa
187.f
acere
c
189.l
ucas
Perc
ent dy
nam
ic b
ranc
hes
deci
ded
sam
e as
pre
viou
s tim
e
![Page 13: Dynamic Binary Optimization – Part 1](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681440b550346895db0a350/html5/thumbnails/13.jpg)
13
Dynamic program behavior
Important characteristics of program
Backward instruction Is typically taken
Predictability of indirect jump Switch statement Return from procedure call
Predictability of data value
0
5
10
15
20
25
1 2 3 4 5 6 7 8 9 >9
Number of different destinations
Perc
ent
of in
dire
ct ju
mps
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
All Add/Sub Load Logic Shift Set
Instruction type
Frac
tion
wit
h co
nsta
nt v
alue
Static
Dynamic
![Page 14: Dynamic Binary Optimization – Part 1](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681440b550346895db0a350/html5/thumbnails/14.jpg)
14
Contents
Overview Dynamic program Behavior Profiling
Overview Role Type Collecting the profile data Profile during interpretation Profiling translated code Overhead
Optimizing Translation blocks
![Page 15: Dynamic Binary Optimization – Part 1](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681440b550346895db0a350/html5/thumbnails/15.jpg)
15
Profiling : Role
Definition The process of collecting instruction and data statistics for
an executing program Usage
Input to code-optimization process Principle of profiling
Predictability of program Past behavior will often hold for future behavior
![Page 16: Dynamic Binary Optimization – Part 1](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681440b550346895db0a350/html5/thumbnails/16.jpg)
16
Profiling : Role
Traditional profiling & optimization procedure
Decomposing the source program into control flow graph
Analyzing the graph and inserting probes to collect profile information
Program running with a typical data input
Generating profile data Static profile log analysis Generating optimized code
Property Fully analyzed Optimal placement of probe Entire program run and complete
profile
HLL Program
Compiler Frontend
A
B C
D
E
F
Compiler Backend
Instrumentedcode
Instrumentedcode
Test data
Program execution
Programstatistics
Optimizingcompiler
Optimized binary
![Page 17: Dynamic Binary Optimization – Part 1](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681440b550346895db0a350/html5/thumbnails/17.jpg)
17
Profiling : Role
Difficulty, requirement and limitation in dynamic optimization
Program structure is not known when a program begins
Program structure must be discovered in an incremental way
Inserting profiling probes in a globally optimal manner
Optimization decision must be made as early as possible
Statistics from a partial execution of the program
A
B
D
E
Programbinary
InterpreterPartial
Programstatistics
Translatoroptimizer
Programdata
![Page 18: Dynamic Binary Optimization – Part 1](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681440b550346895db0a350/html5/thumbnails/18.jpg)
18
Profiling : Role
Tradeoff between overhead and benefit Overhead : Initial analysis + actual collection of profile data Benefit : execution time reduction due to optimization
Static optimization Overhead are paid once
Dynamic optimization Overhead are paid every time a guest program runs Benefits must outweigh the Overhead
![Page 19: Dynamic Binary Optimization – Part 1](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681440b550346895db0a350/html5/thumbnails/19.jpg)
19
Profiling : Type of profile data
Frequency of Execution of different code region Hotspot Interpretation VS binary translation
Profile data which is based on Control flow( branch and Jump ) predictability Can be used for determining aspects of a program’s
dynamic execution behavior Used as basis for gathering and rearranging BBs into larger
unit Used to guide specific optimization
Address Data
![Page 20: Dynamic Binary Optimization – Part 1](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681440b550346895db0a350/html5/thumbnails/20.jpg)
20
Profiling : Type of profile data
Basics Nodes : BBs Edges : flow of control
BB profile Numbers are counts of the
corresponding BB’s execution
Edge profile BB profile can be derived
from edge profile Path profile
Approximate the path profile by using a heuristics based on edge profile
A(65)
B(50) C(15)
D(25)
E(48)
F(17)
A
B C
D
E
F
50
12 13
210
15
38
48
17
15
![Page 21: Dynamic Binary Optimization – Part 1](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681440b550346895db0a350/html5/thumbnails/21.jpg)
21
Profile : collecting the profile
Instrumentation based profiling Target program related events Count all instances of the event being profiled Many different events can be monitored simultaneously
Monitoring method : HW, SW Sampling based profiling
Program runs in its unmodified form Program is interrupted and an instances of program related event is
captured Tradeoff
Instrumentation based slow but can collect given number of profile data over much shorter period of
time Sampling based
fast but requires a longer time for collecting the same amount of profile information
![Page 22: Dynamic Binary Optimization – Part 1](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681440b550346895db0a350/html5/thumbnails/22.jpg)
22
Profile : collecting the profile
Strategy Collection technique depends on emulation spectrum
Interpretation SW instrumentation is about the only choice
Optimizing binary translation, dynamic optimization system Instrumentation
Already well optimized longer running program Sampling
![Page 23: Dynamic Binary Optimization – Part 1](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681440b550346895db0a350/html5/thumbnails/23.jpg)
23
Profile : profiling during interpretation
Key points Source instructions are actually access as data
Profiling code must be added to the interpret routine Profiling is applied to specific instruction type rather than specific
instruction It can be applied for Certain classes of instructions rather
than specific instruction E.g. Backward branch
Method BB profile
profile code should be added to all control transfer instructions after the PC bas been updated
Edge profile Both the PC of the control transfer instruction and the target PC are
used to define a specific instruction
![Page 24: Dynamic Binary Optimization – Part 1](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681440b550346895db0a350/html5/thumbnails/24.jpg)
24
Profile : profiling during interpretation
Profile Table Access method
BB profile : Via PC value of control transfer destination Edge profile : PC value that define an edge Hash function
Contents of entry Basic block or edge count For conditional branch, taken count and not taken count
![Page 25: Dynamic Binary Optimization – Part 1](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681440b550346895db0a350/html5/thumbnails/25.jpg)
25
Profile : profiling during interpretation
Instruction function list..Branch_conditional(inst){
BO = extract(inst,25,5);BI = extract(inst,20,5);displacement = extract( inst, 15, 14 ) * 4;..// code to compute whether branch should be taken..profile_addr = loopup(PC);if( branch_taken)
profile_cnt( profile_addr, taken );PC = PC + displacement;
elseprofile_cnt( profile_addr, nontaken);PC = PC + 4;
}
PCTakencount
Not-takencount
HASHBranch
PC
![Page 26: Dynamic Binary Optimization – Part 1](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681440b550346895db0a350/html5/thumbnails/26.jpg)
26
Profile : profiling during interpretation
Profile Count decaying Problem of profile table
A count field overflow Solution
Key point Optimization method focus on not absolute count but
relative frequency Recent program event history is more valuable than that
of past Decay process
Periodically divide all the profile count by 2
![Page 27: Dynamic Binary Optimization – Part 1](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681440b550346895db0a350/html5/thumbnails/27.jpg)
27
Profile : profiling during interpretation
Profiling Jump Instruction Difficulties of Jump compared with conditional branch
Switch statement : frequently change Return from procedure call : many target address
Solution Key point
Profile-driven optimization of indirect jump tend to be focused on those jumps that very frequently have the same target
Maintain profile table with a small number of target address and track only the more recently used target
![Page 28: Dynamic Binary Optimization – Part 1](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681440b550346895db0a350/html5/thumbnails/28.jpg)
28
Profile : profiling translated code
Instrumenting individual instructions Each individual instruction can have its own custom profiling code
= Profiling can be selectively applied = Profile counters can be assigned to each static instructions
Profile counters can be directly addressed without hashing Profile code can be easily inserted and removed as needed
Translated BasicBlock
Fall-throughstub
Branch targetstub
Increment edgeCounter(j)
If( counter(j) > trigger)invoke optimizer
Elsebranch to targetBB
Increment edgeCounter(i)
If( counter(i) > trigger)invoke optimizer
Elsebranch to fall-throughBB
![Page 29: Dynamic Binary Optimization – Part 1](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681440b550346895db0a350/html5/thumbnails/29.jpg)
29
Profiling : Overhead
Performance overhead Example
To access hash table : hash function + 1 load + 1 compare To increment proper count : 1 load + 1store + 1add
Profiling during interpretation VS profiling translated code Absolute overhead VS relative overhead
Memory overhead Profile table
Overhead reduction method Reducing the number of instrumentation point
Heuristic + Using collected data Code duplication
Attractive for same-ISA optimization ( 4.7 )
![Page 30: Dynamic Binary Optimization – Part 1](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681440b550346895db0a350/html5/thumbnails/30.jpg)
30
Contents
Overview Dynamic program Behavior Profiling Optimizing Translation blocks
Overview Improving locality Traces Superblocks Dynamic superblocks formation Tree group
![Page 31: Dynamic Binary Optimization – Part 1](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681440b550346895db0a350/html5/thumbnails/31.jpg)
31
Optimizing translation blocks : Overview
Two strategy Improving locality Optimization on enlarged translation blocks
![Page 32: Dynamic Binary Optimization – Part 1](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681440b550346895db0a350/html5/thumbnails/32.jpg)
32
Optimizing translation blocks : Improving locality Locality
Temporal Spatial
Problem Cache space Performance
Low instruction fetch
bandwidth
A
B D
C
G
30
29 68
68129
70
F
197
2
E
1
3
Br cond1 == true
A
B
C
Br cond2 == false
Br uncond
D
Br cond3 == true
E
Br uncond
F
G
Br cond4 == true
E(Br Uncond) F(----------------) F(----------------) F(----------------)
![Page 33: Dynamic Binary Optimization – Part 1](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681440b550346895db0a350/html5/thumbnails/33.jpg)
33
Optimizing translation blocks : Improving locality Rearrange the layout of the
blocks in memory Conditional branch tests are
reversed Unconditional branch
removal/Add Instruction fetch efficiency is
improved
G
Br cond1 == false
A
Br cond3 == true
D
E
Br cond4 == true
Br uncond
B
C
Br cond2 == false
Br uncond
F
Br uncond
Br uncond is removed
Br cond1 == true
A
B
C
Br cond2 == false
Br uncond
D
Br cond3 == true
E
Br uncond
F
G
Br cond4 == true
![Page 34: Dynamic Binary Optimization – Part 1](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681440b550346895db0a350/html5/thumbnails/34.jpg)
34
Optimizing translation blocks : Improving locality Procedure inlining A
Call proc xyz
B
.
.
.
K
Call proc xyz
L
X
proc xyz
Z
return
Y
A
B
X
Z
Y
A
B
X
Z
Y
![Page 35: Dynamic Binary Optimization – Part 1](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681440b550346895db0a350/html5/thumbnails/35.jpg)
35
Optimizing translation blocks : Improving locality Partial procedure inlining
In dynamic optimization system
A
Call proc xyz
B
.
.
.
K
Call proc xyz
L
X
proc xyz
Z
return
Y
A
B
X
Y
A
B
X
Z
![Page 36: Dynamic Binary Optimization – Part 1](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681440b550346895db0a350/html5/thumbnails/36.jpg)
36
Optimizing translation blocks : Improving locality Pros and Cons of procedure inlining
Pros Increase spatial locality Remove overhead
Call and return instructions are removed Save/restore instruction are removed
Cons Increase code size Increase register “pressure”
Inlined code needs more register than procedure call Con sequently, procedure inlining is typically used only
for those procedures that are very frequently called and are very small
![Page 37: Dynamic Binary Optimization – Part 1](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681440b550346895db0a350/html5/thumbnails/37.jpg)
37
Optimizing translation blocks
Three ways of rearranging basic blocks according to control flow Trace formation Superblock formation
Most widely used in VM implementation Tree group
Useful when control flow is difficult to predict Provide wider scope for optimization
![Page 38: Dynamic Binary Optimization – Part 1](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681440b550346895db0a350/html5/thumbnails/38.jpg)
38
Optimizing translation blocks : Traces
Traces Chunks of contiguous instructions containing multiple BBs Traces > Superblock
Static traces forming step 1. Profile collection using test data 2. Begin with start point
Most frequently executed BB ,not already part of a trace 3. Collection BB through most common control path, until a stopping
condition is met A block already belonging to another trace is reached The arrival at a procedure call/return boundary
4. Collect the BBs into a trace Reverse branch tests removing/adding unconditional branch
5. stop otherwise go to step 2 In dynamic environment, Traces are not commly used s translation blocks
![Page 39: Dynamic Binary Optimization – Part 1](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681440b550346895db0a350/html5/thumbnails/39.jpg)
39
Optimizing translation blocks : Traces
A
B D
C
G
30
29 68
68129
70
F
197
2
E
1
3
Trace1 Trace2 Trace3
G
Br cond1 == false
A
Br cond3 == true
D
E
Br cond4 == true
Br uncond
B
C
Br cond2 == false
Br uncond
F
Br uncond
Br uncond is removed
![Page 40: Dynamic Binary Optimization – Part 1](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681440b550346895db0a350/html5/thumbnails/40.jpg)
40
Optimizing translation blocks : Superblocks Superblocks VS Traces
Side entrance Problems in forming superblocks
Small and a number of superblocks Too small to provide many opportunities for optimizations
Tail duplication The process of replicating code that appears at the end of a
superblock in order to form other superblock
![Page 41: Dynamic Binary Optimization – Part 1](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681440b550346895db0a350/html5/thumbnails/41.jpg)
41
Optimizing translation blocks : Superblocks
A
B D
C
G
30
29 68
68129
70
F
197
2
E
1
3
A
B D
C
30
29 68
70
F
1
E
3
G G G
97
29 29 292
![Page 42: Dynamic Binary Optimization – Part 1](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681440b550346895db0a350/html5/thumbnails/42.jpg)
42
Optimizing translation blocks : Dynamic superblock formation : Overview
Dynamic Formed incrementally as the source code is being emulated
Complication BB replication leads to more choices
Key question Starting point Continuation Stopping point
![Page 43: Dynamic Binary Optimization – Part 1](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681440b550346895db0a350/html5/thumbnails/43.jpg)
43
Optimizing translation blocks : Dynamic superblock formation : starting point
Heavily used block By using Profile information
Method for determining profile points All basic block Heuristics
Targets of backward branches an candidates starting point Exit arc from an existing superblock
Start threshold When a profiled BB’s execution frequency reaches this
value, a new superblock is started Depends on emulation tradeoff A few tens to hundreds of execution is typical
![Page 44: Dynamic Binary Optimization – Part 1](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681440b550346895db0a350/html5/thumbnails/44.jpg)
44
Optimizing translation blocks : Dynamic superblock formation : Continuation
Continuation Which subsequent blocks should be collected and added as
the superblock is grown Most frequently used approach
Node profile information is used to identify the most likely successor BB
Continuation threshold A relatively complete set of profile data must be collected for
all BBs Typically half of start point threshold
Continuation set At the time superblock formation is to begin, the set of all BBs
that have reached the continuation threshold is collected
![Page 45: Dynamic Binary Optimization – Part 1](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681440b550346895db0a350/html5/thumbnails/45.jpg)
45
Optimizing translation blocks : Dynamic superblock formation : Continuation
Most frequently used procedureStart threshold reachedCollect continuation set
Build superblock from the hottest BB, following control flow edges
Including only BB’s in continuation set
Superblock is completed
Take a hottest as a new start pint
All block in the continuation set is exausted
Emulation process resume with profiling
Until another BB achieves the start threshold
![Page 46: Dynamic Binary Optimization – Part 1](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681440b550346895db0a350/html5/thumbnails/46.jpg)
46
Optimizing translation blocks : Dynamic superblock formation : Continuation
Most Recently used approach Edge profile information Algorithm
Assumption The very next sequence of blocks following a start point is
also likely to be a common path Simply follows the actual dynamic control flow path one edge
at a time Advantage
Only candidate start point need to be profiled = No need to use profiling for continuation blocks = Profile overhead is substantially reduced
![Page 47: Dynamic Binary Optimization – Part 1](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681440b550346895db0a350/html5/thumbnails/47.jpg)
47
Optimizing translation blocks : Dynamic superblock formation : stopping point
Type of heuristics to determine stop condition The start point of the same superblock is reached A start point of some other superblock is reached A superblock has reached some maximum length
A BB can be used in more than one superblock there may be multiple copies of a given BB Explosion of code size
When using the most frequently used heuristic, there are no more candidate BBs that have reached the candidate threshold
An indirect jump is reached, or there is a procedure call
![Page 48: Dynamic Binary Optimization – Part 1](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681440b550346895db0a350/html5/thumbnails/48.jpg)
48
Optimizing translation blocks : Dynamic superblock formation : Example
Most frequently used
A
B D
C
G
30
29 68
68129
70
F
197
2
E
1
3Start point threshold : 100Continuation threshold : 50
![Page 49: Dynamic Binary Optimization – Part 1](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681440b550346895db0a350/html5/thumbnails/49.jpg)
49
Optimizing translation blocks : Dynamic superblock formation : Example
Most Recently used Profile point is just A
because A is target of backward branch
Most likely ADEG BCG FG
However There is about 30% chance
ABCG DEG FG There are cases where a
most recently executed method may not select superblocks quite as well as most frequently executed method
A
B D
C
G
30
29 68
68129
70
F
197
2
E
1
3Start point threshold : 100Continuation threshold : 50
![Page 50: Dynamic Binary Optimization – Part 1](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681440b550346895db0a350/html5/thumbnails/50.jpg)
50
Optimizing translation blocks : Tree group
Background Problems when applying Superblock for Branches that tend to
almost evenly split their decision Side exit is frequently taken compensation code overhead Optimization are typically not done along the side exit losing
performance improvement opportunities Traces, Superblock VS Tree group
Tree group conditional branch outcomes are more evenly balanced Generalization of superblock Multiple flow of control
Superblocks Conditional branches are predominantly decided one way Single flow of control
![Page 51: Dynamic Binary Optimization – Part 1](https://reader035.fdocuments.net/reader035/viewer/2022062314/5681440b550346895db0a350/html5/thumbnails/51.jpg)
51
Optimizing translation blocks : Tree group
A
B D
C
30
29 68
70
F
1
E
3
G G G
97
29 682