CS1104 2001/02 Semester II Help Session IIA Performance Measures

25
CS1104 2001/02 Semester II Help Session IIA Performance Measures Colin Tan S15-04-05 [email protected]

description

CS1104 2001/02 Semester II Help Session IIA Performance Measures. Colin Tan S15-04-05 [email protected]. Basic Concepts Instruction Execution Cycles. Processors execute instructions in several steps: - PowerPoint PPT Presentation

Transcript of CS1104 2001/02 Semester II Help Session IIA Performance Measures

Page 1: CS1104 2001/02 Semester II Help Session IIA Performance Measures

CS1104 2001/02 Semester IIHelp Session IIA

Performance MeasuresColin Tan

S15-04-05

[email protected]

Page 2: CS1104 2001/02 Semester II Help Session IIA Performance Measures

Basic ConceptsInstruction Execution Cycles

• Processors execute instructions in several steps:– Instruction fetch (IF), instruction decode (ID), execute (EX),

memory read (MEM), write result (WB).– Previous step must complete before next step can proceed correctly!

– Coordination between steps relies on a series of “ticks” called “clock cycles” (CC). Clock cycle n is denoted by CCn

– So in our processor:– CC1: IF– CC2: ID– CC3: EX– CC4: MEM– CC5: WB

Page 3: CS1104 2001/02 Semester II Help Session IIA Performance Measures

Basic ConceptsInstruction Execution Cycles

• So each instruction takes a certain number of cycles to execute.

• If processor is NOT pipelined, then an instruction may skip some stages and hence may have fewer cycles.

• The average number of cycles required for a particular instruction is called the instruction CPI.– E.g. ADD may require 2 cycles, SUB may require 3

cycles. Instruction CPI of ADD is therefore 2, and SUB is 3.

Page 4: CS1104 2001/02 Semester II Help Session IIA Performance Measures

Basic ConceptsInstruction Frequency

• A program (e.g. Microsoft Word) is made up of many instructions coming from each of the different types of instructions.– The number of instructions in each class is called the “instruction

frequency” of that class

– E.g. there may be 1017 ADDs, 763 MUL, 27839 SUB etc.

– This is often expressed as a percentage or as a fraction.

Page 5: CS1104 2001/02 Semester II Help Session IIA Performance Measures

Basic ConceptsAverage Cycles Per Instruction

• The instruction frequency and the number of cycles an instruction requires (instruction CPI) can be used to compute what the average Cycles Per Instruction, or simply CPI of a particular program.– Each type of instruction would take a different number of clock

cycles.

– A program consists of several different types of instructions.

– The average CPI is the average number of cycles required to execute each instruction, across all types of instructions.

Page 6: CS1104 2001/02 Semester II Help Session IIA Performance Measures

Calculating Average CPI

• Find the overall CPI of a program running on a processor with the class CPIs and instruction frequencies shown here:Type CPI Instruction Frequency

Add 3 0.4

Sub 2 0.25

Mul 4 0.15

Div 5 0.20

Page 7: CS1104 2001/02 Semester II Help Session IIA Performance Measures

Calculating Average CPI

– Let’s assume that the total number of instructions is IC. Then there are 0.4IC ADD instructions, 0.25IC SUB instructions, 0.15IC MUL instructions and 0.2 DIV instructions.

• Total number of clock cycles used by ADD instructions is 0.4IC x 3, SUB is 0.25IC x 2, MUL is 0.15IC x 4, DIV is 0.2IC x 5 cycles.

– Hence total number of clock cycles used by this program is 0.4IC x 3 + 0.25IC x 2 + 0.15IC x 4 + 0.2IC x 5

– Number of instructions is IC. Hence average number of cycles per instruction (average CPI) is (0.4IC x 3 + 0.25IC x 2 + 0.15IC x 4 + 0.2IC x 5)/1.0IC

• IC cancels off, leaving 0.4 x 3 + 0.25 x 2 + 0.15 x 4 + 0.2 x 5, final answer is 2.7.

• Hence for this program, each instruction requires, on average, 2.7 cycles.

Page 8: CS1104 2001/02 Semester II Help Session IIA Performance Measures

Exercise

• Find the average CPI of the following program:

Page 9: CS1104 2001/02 Semester II Help Session IIA Performance Measures

Exercise• Ratio of instructions is shown below:

• This gives us the following relative frequencies:

Page 10: CS1104 2001/02 Semester II Help Session IIA Performance Measures

Exercise

• Hence our average CPI is:– 0.36 x 2 + 0.32 x 2 + 0.28 x 6 + 0.04 x 12 = 3.56

• Thus, on average, each instruction will take 3.56 clock cycles.

Page 11: CS1104 2001/02 Semester II Help Session IIA Performance Measures

Why is this useful?

• Each cycle that an instruction takes consumes time.

• If the clock rate of a CPU is 500 MHz, then each second there will be 500,000,000 cycles (note: 1 MHz is 106 cycles, NOT 220 cycles!)

• Therefore each cycle requires 1/(500 x 106) seconds– This works out to 2 ns per cycle.

Page 12: CS1104 2001/02 Semester II Help Session IIA Performance Measures

But still..Why is this useful?

• If there are IC instructions in a program (called the instruction count of the program), and if the average CPI is C, then the total number of cycles used by this program is IC x C.

• Each cycle requires 2ns. So therefore the program will require (IC x C x 2) ns to execute.

• This is called the execution time of the program, and forms the basis for performance comparison.– We take a program and run it on machine M1. Take the execution

time TM1, then run the same program on machine M2, taking the execution time TM2. If TM1 > TM2, then machine M2 is faster by M1, and it is faster by TM1 / TM2.

Page 13: CS1104 2001/02 Semester II Help Session IIA Performance Measures

Exercise

• Find i) average CPI, ii) Execution time of the program below for machines M1 and M2, then find the speedup of M2 over M1.

Page 14: CS1104 2001/02 Semester II Help Session IIA Performance Measures
Page 15: CS1104 2001/02 Semester II Help Session IIA Performance Measures
Page 16: CS1104 2001/02 Semester II Help Session IIA Performance Measures

How Caches Affect Performance

• Sometimes the instruction/data required is not present in the cache– This is a cache miss!

– Cache system needs to go to main memory to remedy the miss.• This will take many many cycles!

• If execution proceeds, the results will be meaningless– Either the required instruction is not loaded yet because of the cache miss,

or the data is not loaded.

• CPU responds by freezing the instruction for many cycles– This is to give memory time to produce the instruction/data for the cache

• When cache miss is remedied, CPU re-reads the cache.

• Hence cache misses adds cycles to the instruction, and thus affects the instruction CPI.

Page 17: CS1104 2001/02 Semester II Help Session IIA Performance Measures

How Caches Affect Performance

• Eqn given in lecture notes is:• CPImemory

= Instruction Frequency * L1 Miss rate * (L1 miss penalty + L2 miss rate * L2 miss penalty) + Data Access Frequency * L1 Miss rate * (L1 miss penalty + L2 miss rate * L2 miss penalty)

• Note that we do not use the cache hit figures because the basic instruction CPI already factors this in– The basic instruction CPI includes reading from the instruction cache

assuming a cache hit, or reading from data cache assuming a cache hit.• Hence here we are only concerned with cycles added because of a

cache miss.

Page 18: CS1104 2001/02 Semester II Help Session IIA Performance Measures

Exercise

• Given the following program and machine, assume that L1 miss rate is 0.05, L1 miss penalty is 12 cycles, L2 miss rate is 0.03, L2 miss penalty is 40 cycles, find the average CPI.

Page 19: CS1104 2001/02 Semester II Help Session IIA Performance Measures
Page 20: CS1104 2001/02 Semester II Help Session IIA Performance Measures
Page 21: CS1104 2001/02 Semester II Help Session IIA Performance Measures

One Last Exercise

Page 22: CS1104 2001/02 Semester II Help Session IIA Performance Measures

One Last Exercise

• Moral: Always ensure that the frequencies add up to 1.0 (100%), otherwise you need to normalize the answer by dividing by the total frequency.

Page 23: CS1104 2001/02 Semester II Help Session IIA Performance Measures

Summary

• Instructions are timed using a central clock. Each tick of the clock is called a clock cycle, or simply a cycle.

• Each instruction will require a certain number of cycles on average to operate. This is the instruction CPI.

• Different instructions within a program will have different CPI, however we can compute the average CPI across all instructions in a given program.

• Performance can be measured by running the same program on different machines. If execution time on M1 is TM1, on M2 is TM2, then the speedup of M1 over M2 is TM2/TM1, and vice-versa.

Page 24: CS1104 2001/02 Semester II Help Session IIA Performance Measures

Summary

• Cache misses cause the CPI of an instruction, and the overall CPI of a program to go up.– Processor needs to freeze instruction to allow

memory to deliver missing instruction/data to cache.

• Remember to normalize your CPI if the total frequency adds up to >1.0!

Page 25: CS1104 2001/02 Semester II Help Session IIA Performance Measures

Further Reading

• Please read Dr. Ankush’s notes as well!