Optimizing Compilers for Modern Architectures More Interprocedural Analysis Chapter 11, Sections...
-
Upload
joel-griffith -
Category
Documents
-
view
231 -
download
2
Transcript of Optimizing Compilers for Modern Architectures More Interprocedural Analysis Chapter 11, Sections...
![Page 1: Optimizing Compilers for Modern Architectures More Interprocedural Analysis Chapter 11, Sections 11.2.5 to end.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649cf75503460f949c69a6/html5/thumbnails/1.jpg)
Optimizing Compilers for Modern Architectures
More Interprocedural Analysis
Chapter 11, Sections 11.2.5 to end
![Page 2: Optimizing Compilers for Modern Architectures More Interprocedural Analysis Chapter 11, Sections 11.2.5 to end.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649cf75503460f949c69a6/html5/thumbnails/2.jpg)
Optimizing Compilers for Modern Architectures
Review
• We analyzed several interprocedural problems.—MOD -- set of variables modified in procedure—ALIAS -- set of aliased variables in a procedure.
• The problem of aliasing make interprocedural problems harder than global problems.
![Page 3: Optimizing Compilers for Modern Architectures More Interprocedural Analysis Chapter 11, Sections 11.2.5 to end.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649cf75503460f949c69a6/html5/thumbnails/3.jpg)
Optimizing Compilers for Modern Architectures
SUBROUTINE FOO(N) INTEGER N,M CALL INIT(M,N) DO I = 1,P B(M*I + 1) = 2*B(1) ENDDOEND
SUBROUTINE INIT(M,N) M = NEND
If N = 0 on entry to FOO, the loop is a reduction.Otherwise, we can vectorize the loop.
Constant Propagation
• Propagating constants between procedures can cause significant improvements.
• Dependence testing can be made more precise.
![Page 4: Optimizing Compilers for Modern Architectures More Interprocedural Analysis Chapter 11, Sections 11.2.5 to end.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649cf75503460f949c69a6/html5/thumbnails/4.jpg)
Optimizing Compilers for Modern Architectures
• Definition: Let s = (p,q) be a call site in procedure p, and let x be a parameter of q. Then , the jump function for x at s, gives the value of x in terms of parameters of p.
• The support of is the set of p-parameters that is dependent on.
J sx
J sxJ s
x
Constant Propagation
![Page 5: Optimizing Compilers for Modern Architectures More Interprocedural Analysis Chapter 11, Sections 11.2.5 to end.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649cf75503460f949c69a6/html5/thumbnails/5.jpg)
Optimizing Compilers for Modern Architectures
• Instead of a Def-Use graph, we construct an interprocedural value graph:—Add a node to the graph for each jump function
—If x belongs to the support of , where t lies in the procedure q, then add an edge between and for every call site s = (p,q) for some p.
• We can now apply the constant propagation algorithm to this graph. —Might want to iterate with global propagation
J sxJ t
yJ s
x
J ty
Constant Propagation
![Page 6: Optimizing Compilers for Modern Architectures More Interprocedural Analysis Chapter 11, Sections 11.2.5 to end.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649cf75503460f949c69a6/html5/thumbnails/6.jpg)
Optimizing Compilers for Modern Architectures
JαX
JβVJβ
U
JαY
The constant-propagation algorithm willEventually converge to above values.(might need to iterate with global)
1 2
-13
ExamplePROGRAM MAIN
INTEGER A,B
A = 1
B = 2
α CALL S(A,B)
END
SUBROUTINE S(X,Y)
INTEGER X,Y,Z,W
Z = X + Y
W = X - Y
β CALL T(Z,W)
END
SUBROUTINE T(U,V)
PRINT U,V
END
![Page 7: Optimizing Compilers for Modern Architectures More Interprocedural Analysis Chapter 11, Sections 11.2.5 to end.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649cf75503460f949c69a6/html5/thumbnails/7.jpg)
Optimizing Compilers for Modern Architectures
PROGRAM MAIN INTEGER Aα CALL PROCESS(15,A) PRINT AENDSUBROUTINE PROCESS(N,B) INTEGER N,B,Iβ CALL INIT(I,N) CALL SOLVE(B,I)ENDSUBROUTINE INIT(X,Y) INTEGER X,Y X = 2*YENDSUBROUTINE SOLVE(C,T) INTEGER C,T C = T*10END
•Need a way of building•For x output of p, define to be
the value of x on return from p in terms of p-parameters.
•The support of is defined as above.
J γI
Rpx
Rpx
RINITX ={2* Y} RSOLVE
C ={T *10}
J γT =
RINITX (N) I ∈MOD(β)
undefinedotherwise
⎧ ⎨ ⎪
⎩ ⎪
J αN =15 J β
Y =N
RPROCESSB =
RSOLVEC (J γ
T (N)) C∈MOD(γ)
undefinedotherwise
⎧ ⎨ ⎪
⎩ ⎪
Jump Functions
![Page 8: Optimizing Compilers for Modern Architectures More Interprocedural Analysis Chapter 11, Sections 11.2.5 to end.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649cf75503460f949c69a6/html5/thumbnails/8.jpg)
Optimizing Compilers for Modern Architectures
• The NotKilled problem is easily solved globally:
• To extend NotKilled to the procedure level, construct the reduced-control-flow graph for the procedure:—Vertices consist of procedure entry, procedure exit, and
call sites.—Every edge (x,y) in the graph is annotated with the set
THRU(x,y) of variables not killed on that edge.
NotKilled(b)= ∪c∈succ(b)
(THRU(b,c)∩ NotKilled(c))
Kill Analysis
![Page 9: Optimizing Compilers for Modern Architectures More Interprocedural Analysis Chapter 11, Sections 11.2.5 to end.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649cf75503460f949c69a6/html5/thumbnails/9.jpg)
Optimizing Compilers for Modern Architectures
Reduced Control GraphComputeReducedCFG(G)
remove back edges from G
for each successsor of entry node s, add (entry, s) to worklist
while worklist isn’t empty
remove a ready element (b,s) from the worklist
if s is a call site
add (s,t) to worlist for each sucessor t of s
otherwise if s isn’t the exit node
for each successor t of s
if THRU[b,t] undefined then THRU[b,t] {}
THRU[b,t] THRU[b,t] (THRU[b,s] THRU[s,t])
end
![Page 10: Optimizing Compilers for Modern Architectures More Interprocedural Analysis Chapter 11, Sections 11.2.5 to end.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649cf75503460f949c69a6/html5/thumbnails/10.jpg)
Optimizing Compilers for Modern Architectures
Kill Analysis
ComputeNKILL(p)
for each b in reduced graph in reverse top order
if b is exit node then NKILL[b] {all variables}
else
NKILL[b] {}
for each successor s of b
NKILL[b] NKILL[b] (NKILL[s] THRU[b,s]
if b is a call site (p,q) then
NKILL[b] NKILL[b] NKILL[q]
NKILL[p] NKILL[entry node]
end
![Page 11: Optimizing Compilers for Modern Architectures More Interprocedural Analysis Chapter 11, Sections 11.2.5 to end.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649cf75503460f949c69a6/html5/thumbnails/11.jpg)
Optimizing Compilers for Modern Architectures
Kill Analysis
• ComputeNKill only works in the absence of formal parameters.
• A more complicated algorithm takes parameter binding into account using a binding graph.
• This algorithm ignores aliasing for efficiency.
![Page 12: Optimizing Compilers for Modern Architectures More Interprocedural Analysis Chapter 11, Sections 11.2.5 to end.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649cf75503460f949c69a6/html5/thumbnails/12.jpg)
Optimizing Compilers for Modern Architectures
Symbolic Analysis
• Prove facts about variables other than constancy:—Find a symbolic expression for a variable in terms of other
variables.—Establish a relationship between pairs of variables at some
point in program.—Establish a range of values for a variable at a given point.
![Page 13: Optimizing Compilers for Modern Architectures More Interprocedural Analysis Chapter 11, Sections 11.2.5 to end.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649cf75503460f949c69a6/html5/thumbnails/13.jpg)
Optimizing Compilers for Modern Architectures
[-∞60 [50:∞][1:100]
[-∞:100] [1:∞]
[-∞,∞]
• Jump functions and return jump functions return ranges.
• Meet operation is now more complicated.
• If we can bound number of times upper bound increases and lower bound decreases, the finite-descending-chain property is satisfied.
Range Analysis:
Symbolic Analysis
• Range analysis and symbolic evaluation can be solved using a lattice framework.
![Page 14: Optimizing Compilers for Modern Architectures More Interprocedural Analysis Chapter 11, Sections 11.2.5 to end.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649cf75503460f949c69a6/html5/thumbnails/14.jpg)
Optimizing Compilers for Modern Architectures
DO I = 1,N CALL SOURCE(A,I) CALL SINK(A,I)ENDDO
We want to know if this loop carries a dependence.MOD and USE are some use, but notmuch.
Let be the set of locations in array modified on iteration I and set of locations used on iteration I. Then has a carried true dependence iff
MA(I )UA(I )
MA(I1)∩ UA(I 2) ≠∅ 1≤I1 <I2 ≤N
Array Section Analysis
• Consider the following code:
![Page 15: Optimizing Compilers for Modern Architectures More Interprocedural Analysis Chapter 11, Sections 11.2.5 to end.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649cf75503460f949c69a6/html5/thumbnails/15.jpg)
Optimizing Compilers for Modern Architectures
Array Section Analysis
• One possible lattice representation are sections of the form:
– A(I,L), A(I,*), A(*,L), A(*,*)
• The depth of the lattice is now on the order of the number of array subscripts., and the meet operation is efficient.
• A better representation is one in which upper and lower bounds for each subscript are allowed.
![Page 16: Optimizing Compilers for Modern Architectures More Interprocedural Analysis Chapter 11, Sections 11.2.5 to end.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649cf75503460f949c69a6/html5/thumbnails/16.jpg)
Optimizing Compilers for Modern Architectures
SUBROUTINE SUB1(X,Y,P) INTEGER X,Y CALL P(X,Y)END
What procedure names can bepassed into P?
To avoid loss of precision, we need to record sets of procedure-name tuples when a subroutine has multiple procedure parameters.
Call Graph Construction
• This problem is complicated by procedure parameters:
![Page 17: Optimizing Compilers for Modern Architectures More Interprocedural Analysis Chapter 11, Sections 11.2.5 to end.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649cf75503460f949c69a6/html5/thumbnails/17.jpg)
Optimizing Compilers for Modern Architectures
Call Graph ConstructionProcedure ComputeProcParms ProcParms(p) {} for each procedure p for each call site s = (p,q) passing in procedure names Let t = <N1,N2, …, Nk> be procedure names passed in. worklist worklist {<t,q>} while worklist isn’t empty remove <t= <N1,N2, …, Nk>,p> from worklist ProcParms[p] ProcParms[p] {t} <P1,P2,…,Pk> are parameters bound to <N1,N2, …, Nk> for each call site (p,q) passing in some Pi Let u=<M2,M2,…,Ml> set of procedure names and instances of Pi passed into q if u is not in ProcParms[q] then worklist worklist {<u,q>}end
![Page 18: Optimizing Compilers for Modern Architectures More Interprocedural Analysis Chapter 11, Sections 11.2.5 to end.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649cf75503460f949c69a6/html5/thumbnails/18.jpg)
Optimizing Compilers for Modern Architectures
Inlining
• Inlining procedure calls has several advantages:– Eliminates procedure call overhead.– Allows more optimizations to take place
• However, a study by Mary Hall showed that overuse of inlining can cause slowdowns:
– Breaks compiler procedure assumptions.– Function calls add needed register spills.– Changing function forces global recompilation.
![Page 19: Optimizing Compilers for Modern Architectures More Interprocedural Analysis Chapter 11, Sections 11.2.5 to end.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649cf75503460f949c69a6/html5/thumbnails/19.jpg)
Optimizing Compilers for Modern Architectures
PROCEDURE UPDATE(A,N,IS) REAL A(N) INTEGER I = 1,N A(I*IS-IS+1)=A(I*IS-IS+1)+PI ENDDOEND
If we knew that IS != 0 ata call, then loop can be vectorized.
If we know that IS != 0 at specific call sites, clone a vectorized version of the procedure and use it at those sites.
Procedure Cloning
• Often specific values of function parameters result in better optimizations.
![Page 20: Optimizing Compilers for Modern Architectures More Interprocedural Analysis Chapter 11, Sections 11.2.5 to end.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649cf75503460f949c69a6/html5/thumbnails/20.jpg)
Optimizing Compilers for Modern Architectures
DO I = 1,N CALL FOO()ENDDO
PROCEDURE FOO() …END
CALL FOO()
PROCEDURE FOO() DO I = 1,N … ENDDOEND
Hybrid optimizations
• Combinations of procedures can have benefit.
• One example is loop embedding:
![Page 21: Optimizing Compilers for Modern Architectures More Interprocedural Analysis Chapter 11, Sections 11.2.5 to end.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649cf75503460f949c69a6/html5/thumbnails/21.jpg)
Optimizing Compilers for Modern Architectures
Whole Program
• Want to efficiently do interprocedural analysis without constant recompilation.
• This is a software engineering problem.
• Basic idea: every time the optimizer passes over a component, calculate information telling what other procedures must be rescanned.
• Include a feedback loop that optimizes until no procedures are out of date.
![Page 22: Optimizing Compilers for Modern Architectures More Interprocedural Analysis Chapter 11, Sections 11.2.5 to end.](https://reader030.fdocuments.net/reader030/viewer/2022032702/56649cf75503460f949c69a6/html5/thumbnails/22.jpg)
Optimizing Compilers for Modern Architectures
Summary
• Solution of flow sensitive probems:—Constant propagation—Kill analysis
• Solutions to related problems such as symbolic analysis and array section analysis.
• Other optimizations and ways to integrate these into whole-program compilation.