Performance Models for Application Optimization Walid Abu-Sufah [email protected] Visiting...
-
Upload
kristen-mobley -
Category
Documents
-
view
218 -
download
2
Transcript of Performance Models for Application Optimization Walid Abu-Sufah [email protected] Visiting...
![Page 1: Performance Models for Application Optimization Walid Abu-Sufah abusufah@illinois.edu Visiting Scholar, University of Illinois Associate Professor, University.](https://reader037.fdocuments.net/reader037/viewer/2022103111/5518a4cd550346991f8b4a0c/html5/thumbnails/1.jpg)
Performance Models for Application Optimization
Walid Abu-Sufah
[email protected] Scholar, University of Illinois
Associate Professor, University of Jordan
![Page 2: Performance Models for Application Optimization Walid Abu-Sufah abusufah@illinois.edu Visiting Scholar, University of Illinois Associate Professor, University.](https://reader037.fdocuments.net/reader037/viewer/2022103111/5518a4cd550346991f8b4a0c/html5/thumbnails/2.jpg)
Outline
1. Objective
2. Overview1. Roofline model2. Capacity model
3. Relate roofline/capacity
4. Open Issues
5. Discussion: How could PMUs help
www.upcrc.illinois.edu2
![Page 3: Performance Models for Application Optimization Walid Abu-Sufah abusufah@illinois.edu Visiting Scholar, University of Illinois Associate Professor, University.](https://reader037.fdocuments.net/reader037/viewer/2022103111/5518a4cd550346991f8b4a0c/html5/thumbnails/3.jpg)
1. Objective
www.upcrc.illinois.edu3
Explore how a model for a target architecture could be used for application tuning (may be in a compiler?).
Explore how a model for a target architecture could be used for application tuning (may be in a compiler?).
![Page 4: Performance Models for Application Optimization Walid Abu-Sufah abusufah@illinois.edu Visiting Scholar, University of Illinois Associate Professor, University.](https://reader037.fdocuments.net/reader037/viewer/2022103111/5518a4cd550346991f8b4a0c/html5/thumbnails/4.jpg)
2.1 Roofline Model
• For applications where off-chip memory bandwidth is the constraining resource (limit) in system performance.
• Relates processor performance to off-chip memory traffic.
• Bound and Bottleneck Model– good enough to understand which optimizations to try to get next
level of performance
• So far, demonstrated for several HPC dwarfs and multicore systems.
www.upcrc.illinois.edu4
![Page 5: Performance Models for Application Optimization Walid Abu-Sufah abusufah@illinois.edu Visiting Scholar, University of Illinois Associate Professor, University.](https://reader037.fdocuments.net/reader037/viewer/2022103111/5518a4cd550346991f8b4a0c/html5/thumbnails/5.jpg)
Bounds
= Peak Processing Bandwidth; MFLOP/sec
= Peak DRAM Bandwidth; Mbytes/sec
• “Operational Intensity”: – Average number of Floating Point Operations per Byte to DRAM,
FLOPs/Byte– Varies by multicore design (cache org.) and dwarf– Characterize dwarf for a particular multicore design
5
PB
mB
![Page 6: Performance Models for Application Optimization Walid Abu-Sufah abusufah@illinois.edu Visiting Scholar, University of Illinois Associate Professor, University.](https://reader037.fdocuments.net/reader037/viewer/2022103111/5518a4cd550346991f8b4a0c/html5/thumbnails/6.jpg)
Performance Model Graph
6
Y axis is GFLOPs/sec
X-axis is FLOPs/Byte(i.e. Operational Intensity)
Can plot peak DRAM BW, since
(GFLOPs/sec) (FLOPs/Byte)
= GBytes/sec
mB“Roofline”
pB
mB
![Page 7: Performance Models for Application Optimization Walid Abu-Sufah abusufah@illinois.edu Visiting Scholar, University of Illinois Associate Professor, University.](https://reader037.fdocuments.net/reader037/viewer/2022103111/5518a4cd550346991f8b4a0c/html5/thumbnails/7.jpg)
Roofline Visual Performance Model
7
• “Ridge Point”: minimum Operational Intensity to get Peak Performance • Compute Bound• Memory Bound
Ridge Point
![Page 8: Performance Models for Application Optimization Walid Abu-Sufah abusufah@illinois.edu Visiting Scholar, University of Illinois Associate Professor, University.](https://reader037.fdocuments.net/reader037/viewer/2022103111/5518a4cd550346991f8b4a0c/html5/thumbnails/8.jpg)
Roofline model for AMD Opteron X2
![Page 9: Performance Models for Application Optimization Walid Abu-Sufah abusufah@illinois.edu Visiting Scholar, University of Illinois Associate Professor, University.](https://reader037.fdocuments.net/reader037/viewer/2022103111/5518a4cd550346991f8b4a0c/html5/thumbnails/9.jpg)
Roofline model for Opteron X2 vs. Opteron X4
![Page 10: Performance Models for Application Optimization Walid Abu-Sufah abusufah@illinois.edu Visiting Scholar, University of Illinois Associate Professor, University.](https://reader037.fdocuments.net/reader037/viewer/2022103111/5518a4cd550346991f8b4a0c/html5/thumbnails/10.jpg)
Roofline model with ceilings for Opteron X2
10
![Page 11: Performance Models for Application Optimization Walid Abu-Sufah abusufah@illinois.edu Visiting Scholar, University of Illinois Associate Professor, University.](https://reader037.fdocuments.net/reader037/viewer/2022103111/5518a4cd550346991f8b4a0c/html5/thumbnails/11.jpg)
Roofline model with ceilings for Opteron X2.
![Page 12: Performance Models for Application Optimization Walid Abu-Sufah abusufah@illinois.edu Visiting Scholar, University of Illinois Associate Professor, University.](https://reader037.fdocuments.net/reader037/viewer/2022103111/5518a4cd550346991f8b4a0c/html5/thumbnails/12.jpg)
Roofline model with ceilings for Opteron X2
![Page 13: Performance Models for Application Optimization Walid Abu-Sufah abusufah@illinois.edu Visiting Scholar, University of Illinois Associate Professor, University.](https://reader037.fdocuments.net/reader037/viewer/2022103111/5518a4cd550346991f8b4a0c/html5/thumbnails/13.jpg)
What is next for Roofline
• Non-floating point kernels would be interesting– e.g., Sort (potential exchanges/sec vs GB/s),
Graph Traversal (nodes traversed/sec vs. GB/s)
• Opportunities for others to help investigate: many kernels, multicores, metrics, …
13
![Page 14: Performance Models for Application Optimization Walid Abu-Sufah abusufah@illinois.edu Visiting Scholar, University of Illinois Associate Professor, University.](https://reader037.fdocuments.net/reader037/viewer/2022103111/5518a4cd550346991f8b4a0c/html5/thumbnails/14.jpg)
2.2 Capacity Model
• HW represented as nodes with “peak” BW– In this talk & for illustration purposes, we assume
only two nodes, a memory and a processing node with BWs:
• System is represented as graph of HW nodes
mB pB
![Page 15: Performance Models for Application Optimization Walid Abu-Sufah abusufah@illinois.edu Visiting Scholar, University of Illinois Associate Professor, University.](https://reader037.fdocuments.net/reader037/viewer/2022103111/5518a4cd550346991f8b4a0c/html5/thumbnails/15.jpg)
Performance Depends on:
A. System Characteristics1. Peak BWs of nodes2. Memory hierarchy (cache) organization/ size3. Operational overlap
B. Application Characteristics1. Relative demands on BWs2. Overheads
www.upcrc.illinois.edu15
![Page 16: Performance Models for Application Optimization Walid Abu-Sufah abusufah@illinois.edu Visiting Scholar, University of Illinois Associate Professor, University.](https://reader037.fdocuments.net/reader037/viewer/2022103111/5518a4cd550346991f8b4a0c/html5/thumbnails/16.jpg)
Definitions
• Ration of peak BWs,
• BW-used per node: ,
• Ratio of BWs-used
• Ratio of BW-used per node to system bandwidth-used:
www.upcrc.illinois.edu16
upB
umB
p
mmp B
B,
um
up
up
mp BB
B
,
1
pmup
um
mp B
B,, /1
![Page 17: Performance Models for Application Optimization Walid Abu-Sufah abusufah@illinois.edu Visiting Scholar, University of Illinois Associate Professor, University.](https://reader037.fdocuments.net/reader037/viewer/2022103111/5518a4cd550346991f8b4a0c/html5/thumbnails/17.jpg)
Capacity of A Node
Average node BW utilized by an application
A function of
• Application characteristics
• Node BW
www.upcrc.illinois.edu17
,{ pupp
pup
up
BBifB
BBifBpC
,{ m
umm
mum
um
BBifB
BBifBmC
![Page 18: Performance Models for Application Optimization Walid Abu-Sufah abusufah@illinois.edu Visiting Scholar, University of Illinois Associate Professor, University.](https://reader037.fdocuments.net/reader037/viewer/2022103111/5518a4cd550346991f8b4a0c/html5/thumbnails/18.jpg)
Saturated Node Capacity• Assume that at least one of the nodes is saturated, then
processor capacity, , is given by
www.upcrc.illinois.edu18
A similar expression applies for memory capacity, mC
mps CCC
pC
System capacity,
Similar argument holds for unsaturated node pairSimilar argument holds for unsaturated node pair
![Page 19: Performance Models for Application Optimization Walid Abu-Sufah abusufah@illinois.edu Visiting Scholar, University of Illinois Associate Professor, University.](https://reader037.fdocuments.net/reader037/viewer/2022103111/5518a4cd550346991f8b4a0c/html5/thumbnails/19.jpg)
Saturated Node Capacity Expression – Example
• For αp,m = ½
www.upcrc.illinois.edu19
![Page 20: Performance Models for Application Optimization Walid Abu-Sufah abusufah@illinois.edu Visiting Scholar, University of Illinois Associate Professor, University.](https://reader037.fdocuments.net/reader037/viewer/2022103111/5518a4cd550346991f8b4a0c/html5/thumbnails/20.jpg)
Processor, Memory, and System Capacity Curves ( )
www.upcrc.illinois.edu20
21
, mp
![Page 21: Performance Models for Application Optimization Walid Abu-Sufah abusufah@illinois.edu Visiting Scholar, University of Illinois Associate Professor, University.](https://reader037.fdocuments.net/reader037/viewer/2022103111/5518a4cd550346991f8b4a0c/html5/thumbnails/21.jpg)
3. Relating Roofline/ Capacity
• A processing optimization ceiling, x , in Roofline corresponds to a used processing BW
• A memory optimization ceiling , y, in Roofline corresponds to a used memory BW,
• If an application is optimized using optimizations x and y then
www.upcrc.illinois.edu21
xpB
ymB
ym
xp
xp
mp BB
B
,
1
pmxp
ym
mpB
B,, /1
![Page 22: Performance Models for Application Optimization Walid Abu-Sufah abusufah@illinois.edu Visiting Scholar, University of Illinois Associate Professor, University.](https://reader037.fdocuments.net/reader037/viewer/2022103111/5518a4cd550346991f8b4a0c/html5/thumbnails/22.jpg)
Roofline model with ceilings for Opteron X2
) or ILP ( 1 SIMDpB
5 ,4mB
pB
mB
5,41
1
,
1
mp
p
mp BB
B
![Page 23: Performance Models for Application Optimization Walid Abu-Sufah abusufah@illinois.edu Visiting Scholar, University of Illinois Associate Professor, University.](https://reader037.fdocuments.net/reader037/viewer/2022103111/5518a4cd550346991f8b4a0c/html5/thumbnails/23.jpg)
4. Open Issues
• Modeling with different performance limiting factors – Cache resident client applications (i.e. memory BW is not the
limit)
• Introduce additional bounds: Network BW and IO BW
• Development of tools based on models for use in application optimization
www.upcrc.illinois.edu23
![Page 24: Performance Models for Application Optimization Walid Abu-Sufah abusufah@illinois.edu Visiting Scholar, University of Illinois Associate Professor, University.](https://reader037.fdocuments.net/reader037/viewer/2022103111/5518a4cd550346991f8b4a0c/html5/thumbnails/24.jpg)
5. Discussion:How could PMUs help
www.upcrc.illinois.edu24
![Page 25: Performance Models for Application Optimization Walid Abu-Sufah abusufah@illinois.edu Visiting Scholar, University of Illinois Associate Professor, University.](https://reader037.fdocuments.net/reader037/viewer/2022103111/5518a4cd550346991f8b4a0c/html5/thumbnails/25.jpg)
References: Roofline Model
• S. Williams, A. Waterman, D. Patterson, "Roofline: an insightful visual performance model for multicore architectures,” Communications of the ACM, Volume 52 , Issue 4 (April 2009), Pages 65-76.
• David Patterson,” The Parallel Revolution Has Started: Are You Part of the Solution or Part of the Problem?“, April 8, 2009 lecture in the Parallel@Illinois Distinguished Lecture Series (http://www.parallel.illinois.edu/dls_archive.html )
www.upcrc.illinois.edu25
![Page 26: Performance Models for Application Optimization Walid Abu-Sufah abusufah@illinois.edu Visiting Scholar, University of Illinois Associate Professor, University.](https://reader037.fdocuments.net/reader037/viewer/2022103111/5518a4cd550346991f8b4a0c/html5/thumbnails/26.jpg)
References: Capacity Model
• D. J. Kuck, "Computer System Capacity Fundamentals,” National Bureau of Standards, Technical Note 851, Oct. 1974.
• D. J. Kuck, B. Kumar, A system model for computer performance evaluation, March 1976 SIGMETRICS 76: Proceedings of the 1976 ACM SIGMETRICS Conference on computer performance modeling measurement and evaluation.
• D.J. Kuck, The Structure of Computers and Computations, Vol. I, John Wiley & Sons, Inc., 1978.
www.upcrc.illinois.edu26
![Page 27: Performance Models for Application Optimization Walid Abu-Sufah abusufah@illinois.edu Visiting Scholar, University of Illinois Associate Professor, University.](https://reader037.fdocuments.net/reader037/viewer/2022103111/5518a4cd550346991f8b4a0c/html5/thumbnails/27.jpg)
• David J. Kuck “Capacity-based Codesign of Computer HW and SW“, January 26, 2009 lecture in the Parallel@Illinois Distinguished Lecture Series (http://www.parallel.illinois.edu/dls_archive.html )
www.upcrc.illinois.edu27