# PBS Analytics 12.2 Chart Calculations...

Embed Size (px)

### Transcript of PBS Analytics 12.2 Chart Calculations...

PBS Works is a division of

PBS AnalyticsTM 12.2

Chart Calculations

Altair PBS Analytics 12.2 Chart Calculations updated 12/9/13

Copyright© 2003-2013 Altair Engineering, Inc. All Rights Reserved

Altair PBS Works, Compute Manager™, Display Manager™, PBS™, PBS Works™, PBS Professional®, PBS Application Services™, PBS Analytics™, PBS Desktop™, PBS Por-tal™, e-BioChem™, e-Compute™ and e-Render™ are trademarks of Altair Engineering, Inc. and are protected under U.S. and international laws and treaties.

All other marks are the property of their respective owners.

Copyright notice does not imply publication. Contains trade secrets of Altair Engineering, Inc. Decompilation or disassembly of this software is strictly prohibited. This software is pro-tected under patent #6,859,792 and other patents pending. Usage of Altair Engineering, Inc. software is only as explicitly permitted as stated in the end user software license agreement.

Third Party Agreements

Table of Contents

1 PBS Analytics Chart Calculations 11.1 Cluster Utilization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.1 Jobs Count by Day . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.1.2 Jobs by Software by Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.1.3 Jobs by Software by Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.1.4 Requested Vs Used Memory by Software . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2 Cost Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.2.1 Node Walltime by Group - Non Prime Time . . . . . . . . . . . . . . . . . . . . . . . 81.2.2 Node Walltime by Group - Prime Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.3 Green Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.3.1 Unused Node Capacity by Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121.3.2 Unused Node Capacity by Quarter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131.3.3 Unused Vs Used Node Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

1.4 Productivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151.4.1 Average Wait Time Vs Average Overall Time by Software . . . . . . . . . . . 161.4.2 Average Wait Time by Software by User . . . . . . . . . . . . . . . . . . . . . . . . . 171.4.3 Jobs by Exit Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181.4.4 Successful Vs Unsuccessful Jobs by Node . . . . . . . . . . . . . . . . . . . . . . . . 191.4.5 Successful Vs Unsuccessful Jobs by Software. . . . . . . . . . . . . . . . . . . . . . 201.4.6 User Job Efficiency and Productivity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

PBS Analytics 12.2 Chart Calculations i

PBS Analytics Chart CalculationsThis document provides the calculation details for the default charts available in PBS Analyt-

ics ™ (PBSA) 12.2.

The default charts are divided into categories:

• Cluster Utilization

• Cost Analysis

• Green Computing

• Productivity

Sample data referenced in the chart calculation examples is provided for clarity, and does not necessarily correlate with the data displayed in the chart images.

1.1 Cluster Utilization

These group of charts provide historical data about the cluster such as cluster utilization, soft-ware usage, and job statistics. IT manager can use these charts to help understand usage, throughput, and availability of a site’s cluster over time.

The following charts are available within this category:

• Jobs count by day - shows the number of jobs present on the cluster

• Jobs by software by group - shows the software usage by custom group

• Jobs by software by node - shows the software usage by node

• Requested vs used memory by software - compares the average requested memory to the average used memory by software

When calculating node walltime, placement options other than “exclu-sive” (free:shared, none, and scatter) are considered as “non-exclusive”.

PBS Analytics 12.2 Chart Calculations 1

PBS Analytics Chart Calculations

1.1.1 Jobs Count by Day

This chart shows the jobs that were present on the cluster. The following categories of jobs are counted:

• jobs that started and ended on that day

• jobs that started but did not end on that day

• jobs that ended on that day but did not start on that day

• jobs that were running but were deleted on that day

Calculation

Figure 1-1:Jobs Count by Day

1. No. of S+E paired records + No. of Only S records + No. of Only E records + No. of D+E paired records

where a S record = job started

E record = job ended

D record = job deleted

2 PBS Analytics 12.2 Chart Calculations

PBS Analytics Chart Calculations

1.1.2 Jobs by Software by Group

This chart shows the software usage by custom group.

Calculation

Figure 1-2:Jobs by Software by Group

1. For each custom group, count the total number of jobs for each software (resource_list.software).

PBS Analytics 12.2 Chart Calculations 3

PBS Analytics Chart Calculations

1.1.3 Jobs by Software by Node

This chart shows the software usage by node.

Calculation

Figure 1-3:Jobs by Software by Node

1. Find all jobs for a software (resource_list.software).2. Group the jobs by node by extracting the node value of each of job from exec_vnode.3. Count the number of jobs for each node.

4 PBS Analytics 12.2 Chart Calculations

PBS Analytics Chart Calculations

1.1.4 Requested Vs Used Memory by Software

This chart compares the average requested memory to the average used memory for each soft-ware.

Calculation

Figure 1-4:Requested Vs Used Memory by Software

1. For each software (resource_list.software):

a. Extract the resources_used.mem value for all the finished (E records) jobs.

b. Calculate the average used memory.

c. Extract the Resource_List.mem value for all the finished (E) jobs.

d. Calculate the average requested memory.

PBS Analytics 12.2 Chart Calculations 5

PBS Analytics Chart Calculations

1.2 Cost Analysis

The Cost Analysis category is targeted for the finance department for billing support. The charts showcase resource usage during prime and non-prime hours for each node class.

While analyzing prime/non-prime charts, note that a job may start in one time period and end in another. In this case, CPU/node utilization will be calculated on the basis of resources used by the job during the specific time period. The day is divided into the following three time periods:

• Non Prime Morning: 12:00 AM to 9:00 AM

• Prime Time: 9:00 AM to 5:30 PM

• Non Prime Evening: 5:30 PM to 12:00 AM

For example:

Job 1 walltime will be calculated for Non Prime Morning

Job 2 walltime will be calculated for Non Prime Morning and Prime Time.

Job 3 walltime will be calculated for Prime and Non Prime Evening.

Job 4 walltime will be calculated for Non Prime Morning, Prime, Non Prime Evening.

The following charts are available within this category:

• Node Walltime by Group - Non Prime Time- walltime by group during non prime time

• Node Walltime by Group - Prime Time- walltime by group during prime time

Table 1-1: Jobs running in multiple time periods

Non Prime Morning

12:00 AM to 9:00 AM

Prime Time9:00 AM to 5:30 PM

Non Prime Evening

5:30 PM to 12:00 AM

Job 1 Starts and ends in Non Prime Morning

Job 2 Starts in Non Prime Morning and ends in Prime Time

Job 3 Starts in Prime Time and ends in Non Prime Time Evening

Job 4 Starts in Non Prime Morning and ends in Non Prime Evening

6 PBS Analytics 12.2 Chart Calculations

PBS Analytics Chart Calculations

The percentages shown in the following charts are calculated as follows:

Assumption:

Number of nodes in a cluster = 3 (node1, node2 and node3)

Number of CPUs on each node = 12

Cluster availability = 5 Days

Note: In case the number of groups are more than 150, then as per Pie chart property, the percentage shown on the Pie charts are Pie percentage not the actual utilization. Actual usage in percentage can be viewed by switching off the decimation (refer to section “Con-figuring plot point decimation” in the PBS Analytics Administrator’s Guide for more infor-mation on how to change the decimation value).

Table 1-2: Node usage percentage calculation

Group Name

Node walltime used Percentage calculation

G1 1000 [1000/ (3*12*5*24]*100 =

[1000 / 4320 *100 = 23.14%

G2 120 [120/(3*12*5*24]*100 =

[120/4320]*100 = 02.77%

G3 400 [400/(3*12*5*24]*100 =

[400/4320]*100 = 09.25%

Unused

[This is not a group but categorized as one]

4320 - 1000 + 120 + 400 = 2800

[2800/(3*12*5*24]*100 =

[2800/4320]*100 = 64.81%

PBS Analytics 12.2 Chart Calculations 7

PBS Analytics Chart Calculations

1.2.1 Node Walltime by Group - Non Prime Time

This chart displays the node usage percentage by group during non-prime time hours.

Calculation

Figure 1-5:Node Walltime by Group - Non Prime Time

1. Find all jobs for a specific group.2. Group the jobs by node by extracting the node value for each job from exec_vnode. 3. Extract the job start time.4. Segregate the jobs that fall into the non-prime time and holiday window:

a. Convert the date and time to EPOCH date and time format

b. Reference the job start time and end time. If the job started during Prime Time and finished during Non-Prime Time, then node utilization will be calculated for Non-Prime time only.

c. Calculate the wall time for each job [end - start].

d. Check the placement of the job:

If placement is exclusive, calculate the node wall time as:

[wall time ] * [total number of CPUS on the nodes on which the job ran]

If placement is free, calculate the node wall time as:

[wall time ] * [total number of CPUs the job consumed]

8 PBS Analytics 12.2 Chart Calculations

PBS Analytics Chart Calculations

1.2.2 Node Walltime by Group - Prime Time

This chart displays the node usage percentage by group during prime time hours.

Calculation

Figure 1-6:Node Walltime by Group - Prime Time

1. Find all jobs for a specific group.2. Group the jobs by node by extracting the node value for each job from exec_vnode. 3. Extract the job start time.4. Segregate the jobs that fall into the prime time window:

a. Convert the date and time to EPOCH date and time format

b. Reference the job start time and end time. If the job started during Prime Time and finished during Non-Prime Time, then node utilization will be calculated for Prime Time only.

c. Calculate the wall time for each job [end - start].

d. Check the placement of the job:

If placement is exclusive, calculate the node wall time as:

[wall time ] * [total number of CPUS on the nodes on which the job ran]

If placement is free, calculate the node wall time as:

[wall time ] * [total number of CPUs the job consumed]

PBS Analytics 12.2 Chart Calculations 9

PBS Analytics Chart Calculations

1.3 Green Computing

The charts in this category will be used by the finance managers and IT managers to analyze compute node idle time.

This information can be used for cost savings by taking corrective actions such as bringing an unused node down using the PBS Professional Green Provisioning Toolkit.

The following charts are available within this category:

• Unused Node Capacity by Node- unused walltime by node by time period

• Unused Node Capacity by Quarter - unused walltime by node class by quarter

• Unused Vs Used Node Cycle - used vs unused node walltime by node

Green CPU Walltime calculation example:

Total number of CPUs in a node [ node1 ] = 12

The available node walltime will be calculated based on the following time windows:

Table 1-3: Available node walltime calculation

Time Window Available node walltime

Non Prime Morning(12:00 AM to 9:00 AM) 12 (CPU) * 9 = 108

Prime Time (9:00 AM to 5:30 PM) 12 (CPU) * 8.5 (hrs) = 102

Non Prime Evening(5:30 PM to 12:00 Midnight) 12 (CPU) * 6.5 = 78

10 PBS Analytics 12.2 Chart Calculations

PBS Analytics Chart Calculations

Assume the following jobs ran on node1:

To calculate the Green CPU wall time:

1. Check the start time of these jobs and segregate them into time windows.2. When there are no running jobs in the specific time window, calculate the node wall-

time available on that window as Green CPU walltime.

Based on the information presented in the tables previously, the following conclusions can be made:

Green CPU wall time of 0 indicates that a least a job was present during that time window.

Table 1-4: Jobs running on node1

Job ID

EPOCH Time Human Readable TimeTime

WindowStart time End Time Start time End Time

1 1301805525 1301805600 03 Apr 2011 04:38:45 03 Apr 2011 04:40:00 Non–Prime Morning

2 1301837925 1301838000 03 Apr 2011 13:38:45 03 Apr 2011 13:40:00 Prime

3 1301913525 1365096288 04 Apr 2011 10:38:45 04 Apr 2013 17:24:48 Prime

4 1365157848 1365204288 05 Apr 2013 10:30:48 05 Apr 2013 23:24:48 Prime + Non- Prime Evening

Table 1-5: Green CPU Walltime per time period

DateGreen CPU WalltimeNon–Prime Morning

Green CPU WalltimePrime

Green CPU WalltimeNon–Prime Evening

03 Apr 2011 0 0 78

04 Apr 2011 108 0 78

05 April 2011 108 0 0

PBS Analytics 12.2 Chart Calculations 11

PBS Analytics Chart Calculations

1.3.1 Unused Node Capacity by Node

This chart displays unused walltime by node for the following time periods:

• Non Prime Morning

• Prime

• Non Prime Evening

• Holiday and Weekend

.Calculation

Figure 1-7:Unused Node Capacity by Node

1. Find all jobs for a specific node by referring to exec_vnode.2. For each quarter, extract the time windows [ prime, nonprime - morning , nonprime-

evening or holiday & weekend ].3. Determine if there were any windows in which there was no running jobs.

12 PBS Analytics 12.2 Chart Calculations

PBS Analytics Chart Calculations

1.3.2 Unused Node Capacity by Quarter

This chart displays unused node capacity by quarter:

.Calculation

Figure 1-8:Unused Node Capacity by Quarter

1. Find all jobs for a specific node by referring to exec_vnode.2. Group these jobs by yearly quarters (Q1, Q2, Q3, Q4).3. For each quarter, extract the time windows [ prime, nonprime - morning , nonprime-

evening or holiday & weekend ].4. Calculate the Green CPU walltime in each of these time windows (use EPOCH Time).

PBS Analytics 12.2 Chart Calculations 13

PBS Analytics Chart Calculations

1.3.3 Unused Vs Used Node Cycle

This chart displays unused node walltime by node:

.Calculation

Figure 1-9:Unused Vs Used Node Cycle

1. Calculate the Used Node Walltime by summing the node walltime for all jobs that ran on the node = 162 hrs.

2. Calculate the Unused Node Walltime = Total Node Walltime - Used Node Walltime = 1440 - 162 = 1278 hrs.

For this chart assume the following jobs have ran on node1:

Assumptions:

• Total number of CPUs in a node [ node1 ] = 12

• Total number of days the node was available on cluster = 5 days = 120 hrs

• Total Node Walltime available on node1 = 120 * 12 = 1440 hrs

Table 1-6: Jobs running on node1

Job ID CPU used Run Duration [hrs] Node walltime [hrs]

1 12 10 120

2 4 3 12

3 6 5 30

14 PBS Analytics 12.2 Chart Calculations

PBS Analytics Chart Calculations

1.4 Productivity

The charts in this category are designed to be used by engineering managers to maximize pro-ductivity and utilization.

The following charts are available within this category:

• Average Wait Time Vs Average Overall Time by Software - average wait time vs average overall time per software

• Average Wait Time by Software by User - average wait time by software/user

• Jobs by Exit Status- total number of jobs by exit status

• Successful Vs Unsuccessful Jobs by Node- node utilization based on successful and unsuccessful jobs

• Successful Vs Unsuccessful Jobs by Software- successful jobs vs unsuccessful jobs by software

• User Job Efficiency and Productivity- walltime per user by exit status

PBS Analytics 12.2 Chart Calculations 15

PBS Analytics Chart Calculations

1.4.1 Average Wait Time Vs Average Overall Time by Software

This chart compares the average wait time to the average overall time for each software.

.Calculation

Figure 1-10:Average Wait Time Vs Average Overall Time by Software

1. Find all jobs that are finished (E) for each software using resource_list.software.

a. Extract the wait time as [start - qtime] for each job.

b. Calculate the average wait time.

2. Find all jobs for each software using resource_list.software

a. Extract the overall time as [end - ctime] for each job

b. Calculate the average overall time

16 PBS Analytics 12.2 Chart Calculations

PBS Analytics Chart Calculations

1.4.2 Average Wait Time by Software by User

This chart displays the average wait time by software for each user.

.Calculation

Figure 1-11:Average Wait Time by Software by User

1. Find all finished (E) jobs for each software using the value of resource_list.software.2. Group software jobs by user.3. For each software/user grouping:

a. Extract the wait time as [start - qtime] for each job.

b. Calculate the average wait time of all jobs within the software/user grouping.

PBS Analytics 12.2 Chart Calculations 17

PBS Analytics Chart Calculations

1.4.3 Jobs by Exit Status

This chart displays the total number of jobs for each exit status. This chart can be used to highlight the number of jobs that failed or succeeded.

.Calculation

Figure 1-12:Jobs by Exit Status

1. Find all jobs for each exit status.2. Count the jobs in each exit status category.

18 PBS Analytics 12.2 Chart Calculations

PBS Analytics Chart Calculations

1.4.4 Successful Vs Unsuccessful Jobs by Node

This chart displays a summary of node utilization based on successful and unsuccessful jobs.

.Calculation

Figure 1-13:Successful Vs Unsuccessful Jobs by Node

Successful and unsuccessful are configurable parameters and can be con-figured using the pbsa-config-exits command. By default PBSA configures an exit code of “0” as Successful, and non-zero exit codes as Unsuccessful..

1. Find all jobs for each node using the value of exec_vnode.2. Segregate the jobs that have an exit status of "0":

a. Count the number of these job records.

3. Segregate the jobs that have exit status other than “0”:

a. Count the number of these job records.

PBS Analytics 12.2 Chart Calculations 19

PBS Analytics Chart Calculations

1.4.5 Successful Vs Unsuccessful Jobs by Software

This chart compares successful jobs to unsuccessful jobs by software..

.Calculation

Figure 1-14:Successful Vs Unsuccessful Jobs by Software

Successful and unsuccessful are configurable parameters and can be con-figured using the pbsa-config-exits command. By default PBSA configures an exit code of “0” as Successful, and non-zero exit codes as Unsuccessful..

1. Find all jobs for each node using the value of resource_list.software.2. Segregate the jobs that have an exit status of "0".

a. Count the number of these job records.

3. Segregate the jobs that have exit status other than “0”.

a. Count the number of these job records.

20 PBS Analytics 12.2 Chart Calculations

PBS Analytics Chart Calculations

1.4.6 User Job Efficiency and Productivity

This chart compares job failure rates to successful jobs for each user in relation to usage (walltime).

.Calculation

Figure 1-15:User Job Efficiency and Productivity

1. Find all jobs for a specific user2. Group these jobs by node using the value exec_vnode3. For each job within the user/node grouping:

a. Group the jobs that have exit codes configured as Still Running, Successful, and Unsuccessful.

b. For each grouping:

i. Calculate the wall time for each of these jobs [end - start].

ii. Check the placement of the job:

If placement is exclusive, calculate the node wall time as:

[wall time ] * [total number of CPUS on the nodes on which the job ran]

If placement is free, calculate the node wall time as:

[wall time ] * [total number of CPUs the job consumed]

PBS Analytics 12.2 Chart Calculations 21