Blue Gene Performance Data Repository -...
Transcript of Blue Gene Performance Data Repository -...
© 2013 IBM Corporation
Introduction
! Goal
– To characterize the applications on existing systems
– To understand the system resource usage
• To provide inputs for next generation system design
! Objective
– To collect performance data and store them into a relational database
• With minimal overhead to user and job execution
– Uniform storage format
• To support queries and presentations
• To make comparisons cross applications or platforms
© 2013 IBM Corporation
Application Performance data/trace
Application
Job execution
Performance data/trace
instrumentation
collection
© 2013 IBM Corporation
Design consideration ! Low overhead
– User may simply use modified MPI compiler wrapper – Statically linked light-weight library inserted – By default the performance tool generates limited data
• Performance data from fixed number of ranks • No trace data
! Usability – Performance data stored into plain text file with SQL/CSV format
• Policy compliance • Paired “undo” commands in separate files
– To help user identify jobs later • Meta data is stored (e.g., user comments, tag)
– To further assist users, user interface prototypes are developed to provide • Quick overview – web interface • Easy customization - spreadsheet
4
© 2013 IBM Corporation 5
Performance Data Repository
DB2 bgqsn2
grotius
Blue Gene Compute nodes
mgmt
perf. data
submit Instrumented
binary
Mysql bgqfen6
grotius
© 2013 IBM Corporation 6
How to use it?
! Link the application with performance tool – Use modified version of MPI compiler wrapper – Link the profiler libraries (e.g., -lmpihpmperf or -lpomprofperf)
manually – Supports MPI/Hardware counter, OpenMP profiler
! Run the application
! Query the database after the job is done – SQL statements
© 2013 IBM Corporation 7
Run the application ! (Optional) set the performance database information
– Example on grotius – setenv DBHOST 172.16.3.1 (IP address seen by the compute nodes) – setenv DBPORT 0 (default is 3306 for mysql) – setenv DBUSER root – setenv DBNAME bgresult
! (Optional) User comment/tag – DBNOTE (string), DBTAG (integer)
! (Optional) Control the performance data input to the database ! To have the performance data NOT input into the database
– set environment variable DBINPUT to No: • setenv DBINPUT No • export DBINPUT=No
! To undo the performance data input after the job is run – identify the job ID (should printout when the job runs) – at the mysql command prompt:
• mysql> source perfdb_undo_job_xxx_y.sql
© 2013 IBM Corporation 8
Mysql Relational Database
hpc_run hpc_task
hpc_string
hpc_region
hpc_mpi_ tables
hpc_hpm_ tables
hpc_omp_ table
! Arrows represent the query order
! Starting with JobID
© 2013 IBM Corporation 9
Client�! Platform: – Windows (Excel 2003) – Mac (Excel 2010) * Different version for each platform
! Environment setup: – ODBC driver (for mysql)
! Develop with: – Visual Basic for Application (VBA)
! Event-driven – User click GUI to trigger the connection of
database, retrieval of data, presentation of chart, etc.
9
Design – Excel Version�
Database
ODBC driver
0
1
2
3
4
5
6
7
8
cm1 xhpl bt cg ep ft
derived_GFlops
derived_ld_bytes_per_cycle
derived_st_bytes_per_cycle
Excel
Normal Spread Sheet
Mac: Format converter�
Mac: Code partially rewritten�
© 2013 IBM Corporation 10
Client�! Platform: – IE / Firefox / Chrome / Safari (Mac)
! Develop with: – HTML (skeletal structure) – CSS (outer shell) – Javascript + dojo (action and communication) – CGI with Perl (database connection with SQL,
data management and send back to browser)
! Event-driven
! Security – Isolate the database from client side
10
Design – Webpage Version�
Database
0
1
2
3
4
5
6
7
8
cm1 xhpl bt cg ep ft
derived_GFlops
derived_ld_bytes_per_cycle
derived_st_bytes_per_cycle
Server
Browser
CGI�
Javascript�
Firewall� HTTP�Filters� HTML�
Filtered Data�
© 2013 IBM Corporation 11 11
Tool (Step 1 – Jobs Selection)�
User-defined DSN �
Save or load constraint set�
Select job from job list�
Selected job list or manually input�
Find jobs according to some constraints (non: no constraint) ex. User, time, environment variables�
DBTAG DBNOTE
© 2013 IBM Corporation 12 12
Tool (Step 2 – Component/Issue Selection)�
Presentation Scope�
Component� Issue�
Selected component/issue list�
Add
Del
Histogram�
© 2013 IBM Corporation 13 13
Tool (Step 3 – Chart)��
change y-axis into log/linear scale�
Check raw data�
Change the type of chart (ex. bar,
column…)�
Change the order of series of
chart�
Show error range (for avg)�
© 2013 IBM Corporation
Rosetta - CPU
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Rank 0 Rank n
Commited Instructions
Commited AXU uCode sub-ops
Commited XU uCode sub-ops
Flushed Instructions and Operation Cycles
FXU Dep Stalls
AXU Dep Stalls
Thread Arbitration Stalls
IU empty
© 2013 IBM Corporation
Rosetta – instruction & memory
Rank%0% Rank%n%DDR%bandwidth% 0.001% 0.005%Heap%Usage% 330100736% 666894336%Stack%Usage% 20351% 20351%Gflops% 0% 1.813%IPC% 0.3869% 0.2072%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Rank 0 Rank n
integer ratio
float ratio
SIMD
non-SIMD
L1
L1P
L2
DDR
© 2013 IBM Corporation
Rosetta – MPI comm.
Rank%0%Call%count% Data%size% Time%
MPI%Comm%P2P% 217% 868% 700.658%CollecLve% 7% 84% 0.034%
total%comm% 700.692%total%Lme% 711.03%
Rank%n%Call%count% Data%size% Time%
P2P% 7% 28% 0.022%CollecLve% 7% 84% 0.004%
total%comm% 0.026%total%Lme% 711.03%
© 2013 IBM Corporation
Nektar - CPU
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Rank 0 Rank n
Commited Instructions
Commited AXU uCode sub-ops
Commited XU uCode sub-ops
Flushed Instructions and Operation Cycles
FXU Dep Stalls
AXU Dep Stalls
Thread Arbitration Stalls
IU empty
© 2013 IBM Corporation
Nektar – instruction & memory
Rank%0% Rank%n%Gflops% 2.046% 1.813%IPC% 0.2029% 0.2072%DDR%bandwidth% 2.152% 2.055%Heap%Usage% 888291328% 286769152%Stack%Usage% 20511% 21983%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Rank 0 Rank n
integer ratio
float ratio
SIMD
non-SIMD
L1
L1P
L2
DDR
© 2013 IBM Corporation
Nektar – MPI comm.
Rank%0%Call%count% Data%size% Time%
MPI%Comm% P2P% 157605% 34012073.2% 0.419%CollecLve% 29680% 47930094% 3.773%
total%comm% 5.571%total%Lme% 45.621%
Rank%n%Call%count% Data%size% Time%
P2P% 49833% 15615488.70% 0.247%CollecLve% 29680% 40885632.30% 10.372%
total%comm% 10.908%total%Lme% 45.552%
© 2013 IBM Corporation
Simulation
Performance Projection
Simulation method
Hardware Specification
Application Performance data/
trace
© 2013 IBM Corporation
Simulation NPB D 64
0%
50%
100%
150%
200%
250%
300%
350%
400%
bt cg ep ft is lu mg sp
sim
bw/4
bw*4
comp/3.5
comp*3.5
© 2013 IBM Corporation
Conclusion
! Blue Gene Performance Data Repository – Automatically collects performance information from different aspects of
application executions – Stores it into a relational database for subsequent analysis – Is with minimal overhead to the users
! The Performance Data collected – Helps users understand the application performance behavior and the system
hardware usage. – Helps software/hardware co-design for next generation systems.
! The Blue Gene Performance Data Repository is available to Blue Gene/Q users under an open source license
22