4. Assessing and Understanding Performance

16
4. Assessing and Understanding Performance

description

4. Assessing and Understanding Performance. 4. Performance. 4.1 Introduction 4.2 CPU Performance and Its Factors 4.3 Evaluating Performance 4.4 Real Stuff: Two SPEC Benchmarks and the Performance of Recent Intel Processors 4.5 Fallacies and Pitfalls 4.6 Concluding Remarks - PowerPoint PPT Presentation

Transcript of 4. Assessing and Understanding Performance

Page 1: 4. Assessing and Understanding Performance

4. Assessing and Understanding Performance

Page 2: 4. Assessing and Understanding Performance

Computer Architecture 4-2

4. Performance

4.1 Introduction4.2 CPU Performance and Its Factors4.3 Evaluating Performance4.4 Real Stuff: Two SPEC Benchmarks and the

Performance of Recent Intel Processors4.5 Fallacies and Pitfalls4.6 Concluding Remarks4.7 Historical Perspective and Further Reading4.8 Exercises

Page 3: 4. Assessing and Understanding Performance

Computer Architecture 4-3

How to measure, report, and summarize performance

Defining Performance An analogy

AirplanePassenger capacity

Cruising range

Cruising speed

Passenger throughpu

t

Boeing 777 375 4630 610 228,750

Boeing 747 470 4150 610 286,700

BAC/Sud Concorde

132 4000 1350 178,200

Douglas DC-8-50

146 8720 544 79,424

Back to chapter overview

Figure 4.1

4.1 Introduction

Page 4: 4. Assessing and Understanding Performance

Computer Architecture 4-4

Performance of a Computer

Response time ( = execution time ) The time between the start and completion of a task

Throughput The total amount of a work done in a given time

Performance and execution time Performancex = 1 / Execution timex

X is n times faster than Y

nX

Y

Y

X

Time ExceutionTime Execution

ePerformancePerformanc

Page 5: 4. Assessing and Understanding Performance

Computer Architecture 4-5

Measuring Performance

Definitions of time Wall-clock time = Response time = Elapsed time

Total time to complete a task Including disk accesses, memory accesses, I/O activities, OS

overhead and etc. CPU execution time = CPU time

The time CPU spends computing for this task CPU time = User CPU time + System CPU time

UNIX time command 90.7u 12.9s 2:39 65%

Definitions of performance System performance: based on elapsed time CPU performance: based on user CPU time

Page 6: 4. Assessing and Understanding Performance

Computer Architecture 4-6

CPU execution time

= CPU clock cycles x clock cycle time

= CPU clock cycles / clock rate

Example: Improving Performance Same instruction sets

Computer A : 4 GHz, 10 seconds

Computer B : ? GHz, 6 second

B requires 1.2 times as many clock cycles as A.

Back to chapter overview

4.2 CPU Performance and Its Factors

Page 7: 4. Assessing and Understanding Performance

Computer Architecture 4-7

[Answer]

CPU timeA = CPU clock cyclesA / clock rateA

10 seconds = CPU clock cyclesA / (4 X 109 cycles/sec)

CPU clock cyclesA = 10 sec. X 4 X 109 cycles/sec

= 40 X 109 cycles

CPU timeB = CPU clock cyclesB / clock rateB

= 1.2 X CPU clock cyclesA / clock rateB

6 seconds = 1.2 X 40 X 109 cycles / clock rateB

clock rateB = 1.2 X 40 X 109 cycles / 6 seconds = 8 GHz

Page 8: 4. Assessing and Understanding Performance

Computer Architecture 4-8

Hardware Software Interface

CPU clock cycles = IC x CPI

IC (Instruction Count) Dependent on compilers and architectures

CPI (Cycles Per Instruction) Dependent on implementations

Performance equation

Execution Time = IC x CPI x clock cycle time

= (IC x CPI) / clock rate

Page 9: 4. Assessing and Understanding Performance

Computer Architecture 4-9

Same instruction set architecture, same program Clock cycle timeA = 250ps, CPIA = 2.0

Clock cycle timeB = 500ps, CPIB = 1.2 Which is faster, and by how much ?[Answer]

Let I = instruction count for the program. CPU timeA = ICA x CPIA x clock cycle timeA

= I x 2.0 x 250 ps = 500 x I ps CPU timeB = I x 1.2 x 500 ps = 600 x I ps Then

Thus, A is 1.2 times faster than B for this program.

1.2 ps I 500

ps I 600

time Executiontime Execution

ePerformanc CPUePerformanc CPU

A

B

B

A

Example: Using the Performance Equation

Page 10: 4. Assessing and Understanding Performance

Computer Architecture 4-10

The Big Picture

cycle ClockSecond

nInstructiocycles Clock

nInstructio Time

Components of performance Units of measure

CPU execution time for a program Seconds for the program

Instruction count (IC) Instructions executed for the program

Clock cycles per instruction (CPI) Average clock cycles / Instruction

Clock cycle time Seconds / Clock cycle

Page 11: 4. Assessing and Understanding Performance

Computer Architecture 4-11

Example: Comparing Code Segments

Which will be faster ? What is the CPI for each sequence ?

Instruction class CPI for the class

A 1

B 2

C 3

Inst. Count Code Sequence A B C

1 2 1 2

2 4 1 1

Page 12: 4. Assessing and Understanding Performance

Computer Architecture 4-12

[Answer]

instruction count1 = 2 + 1 + 2 = 5 and

instruction count2 = 4 + 1 + 1 = 6

Thus (1) executes fewer instructions. CPU clock cycles1 = 2x1 + 1x2 + 2x3 = 10 and

CPU clock cycles2 = 4x1 + 1x2 + 1x3 = 9

Thus (2) is faster. CPI1 = CPU clock cycles1 / instruction count1

= 10 / 5 =2 CPI2 = 9 / 6 = 1.5

(2) has lower CPI.

Page 13: 4. Assessing and Understanding Performance

Computer Architecture 4-13

Benchmarking The process of performance comparison for two or more

systems by measurements

Benchmark Programs specifically chosen to measure performance A workload that the user hopes will predict the performance of

the actual workload

Compiler tricks Optimizations in either the architecture or compiler

Back to chapter overview

4.3 Evaluating Performance

Page 14: 4. Assessing and Understanding Performance

Computer Architecture 4-14

Compiler Tricks by IBM

Page 15: 4. Assessing and Understanding Performance

Computer Architecture 4-15

Difficulties with summarizing performance

A is 10 times faster than B for program 1. B is 10 times faster than A for program 2.

Total execution time: A Consistent Summary Measure

AM: Arithmetic Mean =

Weighted arithmetic mean =

n

1 iiTime

n1

n

1i 1.0 iw

n

1i where, )iwi(Time

Computer A Computer B

Program 1(seconds) 1 10

Program 2(seconds) 1000 100

Total time (seconds) 1001 110

Figure 4.4

Comparing and Summarizing Performance

Page 16: 4. Assessing and Understanding Performance

Computer Architecture 4-16

4.6 Concluding Remarks

Three design criteria1. High-performance design

Supercomputer and high-end server

2. Low-cost design Embedded system

3. Cost/performance design Desktop computer

Execution time of real program as the metrics

Back to chapter overview

cycle clockseconds

ninstructiocycle clock

programnsinstructio

programsecond