Chapter 1 Fundamentals of Quantitative Design and Analysismenasce/cs465/slides/CAQA5e_ch1.pdf ·...

53
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 1 Fundamentals of Quantitative Design and Analysis Computer Architecture A Quantitative Approach, Fifth Edition

Transcript of Chapter 1 Fundamentals of Quantitative Design and Analysismenasce/cs465/slides/CAQA5e_ch1.pdf ·...

Page 1: Chapter 1 Fundamentals of Quantitative Design and Analysismenasce/cs465/slides/CAQA5e_ch1.pdf · Classes of Computers Personal Mobile Device (PMD) e.g. smart phones, tablet computers

1 Copyright © 2012, Elsevier Inc. All rights reserved.

Chapter 1

Fundamentals of Quantitative Design and Analysis

Computer Architecture A Quantitative Approach, Fifth Edition

Page 2: Chapter 1 Fundamentals of Quantitative Design and Analysismenasce/cs465/slides/CAQA5e_ch1.pdf · Classes of Computers Personal Mobile Device (PMD) e.g. smart phones, tablet computers

2 Copyright © 2013, Daniel A. Menasce. All rights reserved.

1970: The IBM 7044 Computer Introduction

Page 3: Chapter 1 Fundamentals of Quantitative Design and Analysismenasce/cs465/slides/CAQA5e_ch1.pdf · Classes of Computers Personal Mobile Device (PMD) e.g. smart phones, tablet computers

3 Copyright © 2013, Daniel A. Menasce. All rights reserved.

1970: Magnetic Core Memory Introduction

1core = 1 bit

Page 4: Chapter 1 Fundamentals of Quantitative Design and Analysismenasce/cs465/slides/CAQA5e_ch1.pdf · Classes of Computers Personal Mobile Device (PMD) e.g. smart phones, tablet computers

4

How Does a Computer Work?

  What is the Von Neumann computer architecture?   What are the elements of a Von Neumann

computer?   How do these elements interact to execute

programs?

Copyright © 2013, Daniel A. Menasce. All rights reserved.

Introduction

Page 5: Chapter 1 Fundamentals of Quantitative Design and Analysismenasce/cs465/slides/CAQA5e_ch1.pdf · Classes of Computers Personal Mobile Device (PMD) e.g. smart phones, tablet computers

5

Von Neumann Architecture

  What is the Von Neumann computer architecture?   What are the elements of a Von Neumann

computer?   How do these elements interact to execute

programs?

Copyright © 2013, Daniel A. Menasce. All rights reserved.

Introduction

Page 6: Chapter 1 Fundamentals of Quantitative Design and Analysismenasce/cs465/slides/CAQA5e_ch1.pdf · Classes of Computers Personal Mobile Device (PMD) e.g. smart phones, tablet computers

6

Von Neumann Architecture

Copyright © 2013, Daniel A. Menasce. All rights reserved.

Introduction

Stored-program computer

Data

Program

ALU

Registers

system bus

Page 7: Chapter 1 Fundamentals of Quantitative Design and Analysismenasce/cs465/slides/CAQA5e_ch1.pdf · Classes of Computers Personal Mobile Device (PMD) e.g. smart phones, tablet computers

7

Von Neumann Architecture

Copyright © 2013, Daniel A. Menasce. All rights reserved.

Introduction

Stored-program computer

Data

Program

ALU

Registers

Sequence of instructions: (1) Load/Store from memory to/from registers, (2) Arithmetic/Logical Ops., (3) Conditional and unconditional branches.

General registers, FP registers, Program counter, etc.

Page 8: Chapter 1 Fundamentals of Quantitative Design and Analysismenasce/cs465/slides/CAQA5e_ch1.pdf · Classes of Computers Personal Mobile Device (PMD) e.g. smart phones, tablet computers

8

Example of Instruction Formats

Copyright © 2013, Daniel A. Menasce. All rights reserved.

Introduction

Registers

opcode RX RY RZ RZ RX op RY

opcode RX Address RX Mem(address) Mem(address) RX

opcode Address Unconditional or Conditional Jump to Address

Page 9: Chapter 1 Fundamentals of Quantitative Design and Analysismenasce/cs465/slides/CAQA5e_ch1.pdf · Classes of Computers Personal Mobile Device (PMD) e.g. smart phones, tablet computers

9 Copyright © 2012, Elsevier Inc. All rights reserved.

Computer Technology   Performance improvements:

  Improvements in semiconductor technology   Feature size, clock speed

  Improvements in computer architectures   Enabled by HLL compilers, UNIX   Lead to RISC architectures

  Together have enabled:   Lightweight computers   Productivity-based managed/interpreted

programming languages

Introduction

Page 10: Chapter 1 Fundamentals of Quantitative Design and Analysismenasce/cs465/slides/CAQA5e_ch1.pdf · Classes of Computers Personal Mobile Device (PMD) e.g. smart phones, tablet computers

10 Copyright © 2012, Elsevier Inc. All rights reserved.

Single Processor Performance Introduction

RISC

Move to multi-processor. From ILP to DLP and TLP

Page 11: Chapter 1 Fundamentals of Quantitative Design and Analysismenasce/cs465/slides/CAQA5e_ch1.pdf · Classes of Computers Personal Mobile Device (PMD) e.g. smart phones, tablet computers

11

Moore’s Law

Copyright © 2013, Daniel A. Menasce. All rights reserved.

Introduction

Source: Wikimedia Commons

The number of transistors on integrated circuits doubles approximately every two years. (Gordon Moore, 1965)

Page 12: Chapter 1 Fundamentals of Quantitative Design and Analysismenasce/cs465/slides/CAQA5e_ch1.pdf · Classes of Computers Personal Mobile Device (PMD) e.g. smart phones, tablet computers

12 Copyright © 2012, Elsevier Inc. All rights reserved.

Current Trends in Architecture   Cannot continue to leverage Instruction-Level

parallelism (ILP)   Single processor performance improvement ended in

2003

  New models for performance:   Data-level parallelism (DLP)   Thread-level parallelism (TLP)   Request-level parallelism (RLP)

  These require explicit restructuring of the application

Introduction

Page 13: Chapter 1 Fundamentals of Quantitative Design and Analysismenasce/cs465/slides/CAQA5e_ch1.pdf · Classes of Computers Personal Mobile Device (PMD) e.g. smart phones, tablet computers

13

Instruction Level Parallelism (ILP)

Copyright © 2013, Daniel A. Menasce. All rights reserved.

Introduction In high-level language: e= a + b; f = c + d; g = e * f;

a b c d + + e f * g

In assembly language: * e = a+b 001 LOAD A,R1 002 LOAD B,R2 003 ADD R1,R2,R3 004 STO R3,E * f = c + d 005 LOAD C,R4 006 LOAD D,R5 007 ADD R4,R5,R6 008 STO R6,F * g = e * f 009 MULT R3,R6,R7 010 STO R7,G

Page 14: Chapter 1 Fundamentals of Quantitative Design and Analysismenasce/cs465/slides/CAQA5e_ch1.pdf · Classes of Computers Personal Mobile Device (PMD) e.g. smart phones, tablet computers

14

Instruction Level Parallelism

Copyright © 2013, Daniel A. Menasce. All rights reserved.

Introduction In assembly language: * e = a+b 001 LOAD A,R1 002 LOAD B,R2 003 ADD R1,R2,R3 004 STO R3,E * f = c + d 005 LOAD C,R4 006 LOAD D,R5 007 ADD R4,R5,R6 008 STO R6,F * g = e * f 009 MULT R3,R6,R7 010 STO R7,G

Instructions that can potentially be executed in parallel:

001, 002, 005, 006 003, 007 004, 008, 009 010

Page 15: Chapter 1 Fundamentals of Quantitative Design and Analysismenasce/cs465/slides/CAQA5e_ch1.pdf · Classes of Computers Personal Mobile Device (PMD) e.g. smart phones, tablet computers

15

Data Level Parallelism (DLP)

Copyright © 2013, Daniel A. Menasce. All rights reserved.

Introduction

Task Data Task Data

Page 16: Chapter 1 Fundamentals of Quantitative Design and Analysismenasce/cs465/slides/CAQA5e_ch1.pdf · Classes of Computers Personal Mobile Device (PMD) e.g. smart phones, tablet computers

16

Task Level Parallelism (TLP)

Copyright © 2013, Daniel A. Menasce. All rights reserved.

Introduction

Tasks Task

Page 17: Chapter 1 Fundamentals of Quantitative Design and Analysismenasce/cs465/slides/CAQA5e_ch1.pdf · Classes of Computers Personal Mobile Device (PMD) e.g. smart phones, tablet computers

17 Copyright © 2012, Elsevier Inc. All rights reserved.

Classes of Computers   Personal Mobile Device (PMD)

  e.g. smart phones, tablet computers   Emphasis on energy efficiency and real-time

  Desktop Computing   Emphasis on price-performance

  Servers   Emphasis on availability, scalability, throughput

  Clusters / Warehouse Scale Computers   Used for “Software as a Service (SaaS)”   Emphasis on availability and price-performance   Sub-class: Supercomputers, emphasis: floating-point

performance and fast internal networks   Embedded Computers

  Emphasis: price

Classes of C

omputers

Page 18: Chapter 1 Fundamentals of Quantitative Design and Analysismenasce/cs465/slides/CAQA5e_ch1.pdf · Classes of Computers Personal Mobile Device (PMD) e.g. smart phones, tablet computers

18 Copyright © 2012, Elsevier Inc. All rights reserved.

Parallelism   Classes of parallelism in applications:

  Data-Level Parallelism (DLP)   Task-Level Parallelism (TLP)

  Classes of architectural parallelism:   Instruction-Level Parallelism (ILP)   Vector architectures/Graphic Processor Units (GPUs)   Thread-Level Parallelism   Request-Level Parallelism

Classes of C

omputers

Page 19: Chapter 1 Fundamentals of Quantitative Design and Analysismenasce/cs465/slides/CAQA5e_ch1.pdf · Classes of Computers Personal Mobile Device (PMD) e.g. smart phones, tablet computers

19 Copyright © 2012, Elsevier Inc. All rights reserved.

Flynn’s Taxonomy   Single instruction stream, single data stream (SISD)

  Single instruction stream, multiple data streams (SIMD)   Vector architectures   Multimedia extensions   Graphics processor units

  Multiple instruction streams, single data stream (MISD)   No commercial implementation

  Multiple instruction streams, multiple data streams (MIMD)   Tightly-coupled MIMD (thread-level parallelism)   Loosely-coupled MIMD (cluster or warehouse-scale computing)

Classes of C

omputers

Page 20: Chapter 1 Fundamentals of Quantitative Design and Analysismenasce/cs465/slides/CAQA5e_ch1.pdf · Classes of Computers Personal Mobile Device (PMD) e.g. smart phones, tablet computers

20 Copyright © 2012, Elsevier Inc. All rights reserved.

Defining Computer Architecture   “Old” view of computer architecture:

  Instruction Set Architecture (ISA) design   i.e. decisions regarding:

  Registers (register-to-memory or load-store), memory addressing (e.g., byte addressing, alignment), addressing modes e.g., register, immediate, displacement), instruction operands (type and size), available operations (CISC or RISC), control flow instructions, instruction encoding (fixed vs. variable length)

  “Real” computer architecture:   Specific requirements of the target machine   Design to maximize performance within constraints:

cost, power, and availability   Includes ISA, microarchitecture, hardware

Defining C

omputer A

rchitecture

Page 21: Chapter 1 Fundamentals of Quantitative Design and Analysismenasce/cs465/slides/CAQA5e_ch1.pdf · Classes of Computers Personal Mobile Device (PMD) e.g. smart phones, tablet computers

21 Copyright © 2012, Elsevier Inc. All rights reserved.

Trends in Technology   Integrated circuit technology

  Transistor density: 35%/year   Die size: 10-20%/year   Integration overall (Moore’s Law): 40-55%/year

  DRAM capacity: 25-40%/year (slowing)

  Flash capacity (non-volatile semiconductor memory): 50-60%/year   15-20X cheaper/bit than DRAM

  Magnetic disk technology: density increase: 40%/year   15-25X cheaper/bit then Flash   300-500X cheaper/bit than DRAM

Trends in Technology

Page 22: Chapter 1 Fundamentals of Quantitative Design and Analysismenasce/cs465/slides/CAQA5e_ch1.pdf · Classes of Computers Personal Mobile Device (PMD) e.g. smart phones, tablet computers

22 Copyright © 2012, Elsevier Inc. All rights reserved.

Bandwidth and Latency   Bandwidth or throughput

  Total work done in a given time (e.g., MB/sec, MIPS)   10,000-25,000X improvement for processors   300-1200X improvement for memory and disks

  Latency or response time   Time between start and completion of an event   30-80X improvement for processors   6-8X improvement for memory and disks

Trends in Technology

Page 23: Chapter 1 Fundamentals of Quantitative Design and Analysismenasce/cs465/slides/CAQA5e_ch1.pdf · Classes of Computers Personal Mobile Device (PMD) e.g. smart phones, tablet computers

23 Copyright © 2012, Elsevier Inc. All rights reserved.

Bandwidth and Latency

Log-log plot of bandwidth and latency milestones

Trends in Technology

Page 24: Chapter 1 Fundamentals of Quantitative Design and Analysismenasce/cs465/slides/CAQA5e_ch1.pdf · Classes of Computers Personal Mobile Device (PMD) e.g. smart phones, tablet computers

24 Copyright © 2012, Elsevier Inc. All rights reserved.

Transistors and Wires   Feature size

  Minimum size of transistor or wire in x or y dimension

  10 microns in 1971 to .032 microns in 2011   Transistor performance scales linearly

  Wire delay does not improve with feature size!   Integration density scales quadratically

Trends in Technology

Page 25: Chapter 1 Fundamentals of Quantitative Design and Analysismenasce/cs465/slides/CAQA5e_ch1.pdf · Classes of Computers Personal Mobile Device (PMD) e.g. smart phones, tablet computers

25 Copyright © 2012, Elsevier Inc. All rights reserved.

Power (1 Watt = 1 Joule/sec) and Energy (Joules)   Problem: Get power in, get power out

  Thermal Design Power (TDP)   Characterizes sustained power consumption   Used as target for power supply and cooling system   Lower than peak power, higher than average power

consumption

  Clock rate can be reduced dynamically to limit power consumption

  Energy per task is often a better measurement

Trends in Pow

er and Energy

Page 26: Chapter 1 Fundamentals of Quantitative Design and Analysismenasce/cs465/slides/CAQA5e_ch1.pdf · Classes of Computers Personal Mobile Device (PMD) e.g. smart phones, tablet computers

26

Energy and Power Example

Copyright © 2013, Daniel A. Menasce. All rights reserved.

Introduction

Processor A Processor B 20% higher average power consumption: 1.2 P

P

Executes a task in 70% of the time needed by B: 0.7 * T

T

Energy consumption: 1.2 * 0.7 * T = 0.84 P * T

Energy consumption = P * T

More energy efficient!

It is better to use energy instead of power for comparing a fixed workload.

Page 27: Chapter 1 Fundamentals of Quantitative Design and Analysismenasce/cs465/slides/CAQA5e_ch1.pdf · Classes of Computers Personal Mobile Device (PMD) e.g. smart phones, tablet computers

27 Copyright © 2012, Elsevier Inc. All rights reserved.

Dynamic Energy and Power   Dynamic energy

  Transistor switch from 0 -> 1 or 1 -> 0   ½ x Capacitive load x Voltage2

  Dynamic power   ½ x Capacitive load x Voltage2 x Frequency switched

  Reducing clock rate reduces power, not energy

Trends in Pow

er and Energy

Page 28: Chapter 1 Fundamentals of Quantitative Design and Analysismenasce/cs465/slides/CAQA5e_ch1.pdf · Classes of Computers Personal Mobile Device (PMD) e.g. smart phones, tablet computers

28 Copyright © 2012, Elsevier Inc. All rights reserved.

Dynamic Energy and Power: Example

  Assume a processor with Dynamic Voltage Scaling (DVS) and that a 15% reduction in voltage results in 15% reduction in frequency. What is the impact on dynamic energy and dynamic power? Assume that the capacitance is unchanged.

Trends in Pow

er and Energy

EnergynewEnergyold

=(Voltage × 0.85)2

Voltage2= 0.72

PowernewPowerold

=Voltage × 0.85( )2 × (Frequency× 0.85)

Voltage2 ×Frequency= 0.61

Page 29: Chapter 1 Fundamentals of Quantitative Design and Analysismenasce/cs465/slides/CAQA5e_ch1.pdf · Classes of Computers Personal Mobile Device (PMD) e.g. smart phones, tablet computers

29 Copyright © 2012, Elsevier Inc. All rights reserved.

Power   Intel 80386

consumed ~ 2 W   3.3 GHz Intel

Core i7 consumes 130 W

  Heat must be dissipated from 1.5 x 1.5 cm chip

  This is the limit of what can be cooled by air

Trends in Pow

er and Energy

Page 30: Chapter 1 Fundamentals of Quantitative Design and Analysismenasce/cs465/slides/CAQA5e_ch1.pdf · Classes of Computers Personal Mobile Device (PMD) e.g. smart phones, tablet computers

30 Copyright © 2012, Elsevier Inc. All rights reserved.

Reducing Power   Techniques for reducing power:

  Do nothing well: turn-off clock of inactive modules.

  Dynamic Voltage-Frequency Scaling   Low power state for DRAM, disks (spin-down)   Overclocking some cores and turning off other

cores

Trends in Pow

er and Energy

Page 31: Chapter 1 Fundamentals of Quantitative Design and Analysismenasce/cs465/slides/CAQA5e_ch1.pdf · Classes of Computers Personal Mobile Device (PMD) e.g. smart phones, tablet computers

31 Copyright © 2012, Elsevier Inc. All rights reserved.

Static Power   Static power consumption

  Leakage current flows even when a transistor is off

  Leakage can be as high as 50% due to large SRAM caches that need power to maintain values.

  Powerstatic = Currentstatic x Voltage   Scales with number of transistors   To reduce: power gating (i.e., turn off the

power supply).

Trends in Pow

er and Energy

Page 32: Chapter 1 Fundamentals of Quantitative Design and Analysismenasce/cs465/slides/CAQA5e_ch1.pdf · Classes of Computers Personal Mobile Device (PMD) e.g. smart phones, tablet computers

32 Copyright © 2012, Elsevier Inc. All rights reserved.

Trends in Cost   Cost driven down by learning curve

  Yield (% of manufactured devices that survive testing).

  DRAM: price closely tracks cost

  Microprocessors: price depends on volume (less time to get down the learning curve and increase in manufacturing efficiency)   10% less for each doubling of volume

Trends in Cost

Page 33: Chapter 1 Fundamentals of Quantitative Design and Analysismenasce/cs465/slides/CAQA5e_ch1.pdf · Classes of Computers Personal Mobile Device (PMD) e.g. smart phones, tablet computers

33 Copyright © 2012, Elsevier Inc. All rights reserved.

Integrated Circuit Cost   Integrated circuit

  Bose-Einstein formula:

  Defects per unit area = 0.016-0.057 defects per square cm (2010)   N = process-complexity factor = 11.5-15.5 (40 nm, 2010)

Trends in Cost

# dies along the wafer’s perimeter

empirical formula

Page 34: Chapter 1 Fundamentals of Quantitative Design and Analysismenasce/cs465/slides/CAQA5e_ch1.pdf · Classes of Computers Personal Mobile Device (PMD) e.g. smart phones, tablet computers

34 Copyright © 2012, Elsevier Inc. All rights reserved.

Integrated Circuit Cost: Example   What is the number of dies in a 30 cm-

diameter wafer for a die that is 1.5 cm on a side and for a die that is 1.0 cm on a side?

Trends in Cost

Dies per wafer = π × (30/2)2

1.5 ×1.5−

π × 302 ×1.5 ×1.5

= 270

Dies per wafer = π × (30/2)2

1.0 ×1.0−

π × 302 ×1.0 ×1.0

= 640

Page 35: Chapter 1 Fundamentals of Quantitative Design and Analysismenasce/cs465/slides/CAQA5e_ch1.pdf · Classes of Computers Personal Mobile Device (PMD) e.g. smart phones, tablet computers

35 Copyright © 2012, Elsevier Inc. All rights reserved.

Integrated Circuit Cost: Example   What is the die yield for dies that are 1.5

cm on a side and 1.0 cm on a side, assuming a defect density of 0.031/cm2, N = 13.5, a wafer yield of 100%?

Trends in Cost

Die yield = 11 +0.031×1.5 ×1.5( )13.5 = 0.40

Die yield = 11 +0.031×1.0 ×1.0( )13.5 = 0.66

Page 36: Chapter 1 Fundamentals of Quantitative Design and Analysismenasce/cs465/slides/CAQA5e_ch1.pdf · Classes of Computers Personal Mobile Device (PMD) e.g. smart phones, tablet computers

36 Copyright © 2012, Elsevier Inc. All rights reserved.

Dependability   Service Level Agreement (SLA) or Service Level

Objectives (SLO)

  Module reliability   Mean time to failure (MTTF) – reliability measure   Mean time to repair (MTTR) – service interruption   Mean time between failures (MTBF) = MTTF + MTTR   Availability = MTTF / MTBF

Dependability

Page 37: Chapter 1 Fundamentals of Quantitative Design and Analysismenasce/cs465/slides/CAQA5e_ch1.pdf · Classes of Computers Personal Mobile Device (PMD) e.g. smart phones, tablet computers

37 © 2004 D. A. Menascé. All Rights Reserved.

MTTR, MTTF, and MTBF

time

failure failure machine is fixed

MTTR MTTF

MTBF

Page 38: Chapter 1 Fundamentals of Quantitative Design and Analysismenasce/cs465/slides/CAQA5e_ch1.pdf · Classes of Computers Personal Mobile Device (PMD) e.g. smart phones, tablet computers

38 Copyright © 2012, Elsevier Inc. All rights reserved.

Dependability Example   A disk subsystem has the following components

and MTTF values:

  Assume that lifetimes are exponentially distributed and independent, what is the system’s MTTF?

Dependability

Component MTTF (hours) 10 disks Each at 1,000,000

hours 1 ATA controller 500,000 1 power supply 200,000 1 fan 200,00 1 ATA cable 1,000,000

Page 39: Chapter 1 Fundamentals of Quantitative Design and Analysismenasce/cs465/slides/CAQA5e_ch1.pdf · Classes of Computers Personal Mobile Device (PMD) e.g. smart phones, tablet computers

39 Copyright © 2012, Elsevier Inc. All rights reserved.

Dependability Example Cont’d D

ependability Component MTTF (hours) 10 disks Each at 1,000,000

hours 1 ATA controller 500,000 1 power supply 200,000 1 fan 200,00 1 ATA cable 1,000,000

Failure Ratesyst =10 × 11,000,000

×1

500,000×

1200,000

×1

200,000×

11,000,000

= 23,000 /billion hours

MTTFsyst =1

Failure Ratesyst

= 43,500 hours ≈ 4.96 yrs

Page 40: Chapter 1 Fundamentals of Quantitative Design and Analysismenasce/cs465/slides/CAQA5e_ch1.pdf · Classes of Computers Personal Mobile Device (PMD) e.g. smart phones, tablet computers

40 Copyright © 2012, Elsevier Inc. All rights reserved.

Measuring Performance   Typical performance metrics:

  Response time (of interest to users)   Throughput (of interest to operators)

  Speedup of X relative to Y   Execution timeY / Execution timeX

  Execution time   Wall clock time: includes all system overheads   CPU time: only computation time

  Benchmarks   Kernels (e.g., matrix multiply)   Toy programs (e.g., sorting)   Synthetic benchmarks (e.g., Dhrystone)   Benchmark suites (e.g., SPEC, TPC)

Measuring P

erformance

Page 41: Chapter 1 Fundamentals of Quantitative Design and Analysismenasce/cs465/slides/CAQA5e_ch1.pdf · Classes of Computers Personal Mobile Device (PMD) e.g. smart phones, tablet computers

41 Copyright © 2013, Daniel A. Menasce

SPECRate and SPECRatio   SPECrate: a throughput metrics. Measures the number jobs of a

given type that can be processed in a given time interval.

  SPECratio = ratio between elapsed time for a given job at a reference machines and the elapsed time of the same job at a given machine.

  Comparing machines A and B:

Measuring P

erformance

Execution Timereference =10.5secExecution Timemachine A = 5.25secSPECratiomachine A =10.5 /5.25 = 2

Execution Timemachine B = 21secExecution Timemachine A = 5.25sec

SPECratiomachine A

SPECratiomachine B

=

Execution Timereference

Execution Timemachine AExecution Timereference

Execution Timemachine B

=Execution Timemachine B

Execution Timemachine A

= 21/5.25 = 4

Page 42: Chapter 1 Fundamentals of Quantitative Design and Analysismenasce/cs465/slides/CAQA5e_ch1.pdf · Classes of Computers Personal Mobile Device (PMD) e.g. smart phones, tablet computers

42 Copyright © 2013, Daniel A. Menasce

Geometric Mean of SPECratios   When computing the average of SPECratios one should use the

geometric mean and not the arithmetic mean:

Measuring P

erformance

Geometric mean = xii=1

n

∏n

Program SPECRatio A 10.2 B 21.5 C 15.2

Geometric mean 14.94

Geometric mean = 10.2 × 21.5 ×15.53 = 3,333.363 =14.94

Page 43: Chapter 1 Fundamentals of Quantitative Design and Analysismenasce/cs465/slides/CAQA5e_ch1.pdf · Classes of Computers Personal Mobile Device (PMD) e.g. smart phones, tablet computers

43 Copyright © 2012, Elsevier Inc. All rights reserved.

Principles of Computer Design   Take Advantage of Parallelism

  e.g., multiple processors, disks, memory banks, pipelining, multiple functional units

  Principle of Locality   Reuse of data and instructions

  Focus on the Common Case   Amdahl’s Law

Principles

Page 44: Chapter 1 Fundamentals of Quantitative Design and Analysismenasce/cs465/slides/CAQA5e_ch1.pdf · Classes of Computers Personal Mobile Device (PMD) e.g. smart phones, tablet computers

44 Copyright © 2013, Daniel Menasce. All rights reserved.

Amdahl’s Law: Example   A program takes 60 sec to execute. But, 30% of

its execution time can have its execution time improved by a factor of 5. What is the overall speedup?

Principles

Execution Timenew = 0.7 × 60 + 0.3× 60 /5 = 42 +18 /5 = 45.6 secSpeedupoverall = 60 /45.6 =1.316

Page 45: Chapter 1 Fundamentals of Quantitative Design and Analysismenasce/cs465/slides/CAQA5e_ch1.pdf · Classes of Computers Personal Mobile Device (PMD) e.g. smart phones, tablet computers

45 Copyright © 2013, Daniel Menasce. All rights reserved.

Amdahl’s Law: Example   There are two options to enhance the

performance of a graphics application: enhance by a factor of 10 FP SQRT or enhance by a factor of 1.6 all FP operations. Which is best?

Principles

Type of Enhancement

% Occurrence Speedup of Enhancement

Overall Speedup

FP SQRT 20% 10 =1/(0.2/10+0.8)=1.22 All FP Ops. 50% 1.6 =1/(0.5+0.5/1.6)=1.23

Page 46: Chapter 1 Fundamentals of Quantitative Design and Analysismenasce/cs465/slides/CAQA5e_ch1.pdf · Classes of Computers Personal Mobile Device (PMD) e.g. smart phones, tablet computers

46 Copyright © 2012, Elsevier Inc. All rights reserved.

Principles of Computer Design   The Processor Performance Equation

Principles

(average) Clock cycles per instruction

Page 47: Chapter 1 Fundamentals of Quantitative Design and Analysismenasce/cs465/slides/CAQA5e_ch1.pdf · Classes of Computers Personal Mobile Device (PMD) e.g. smart phones, tablet computers

47 Copyright © 2012, Elsevier Inc. All rights reserved.

Processor Equations: Example P

rinciples

  We have the following measurements:

  Compare the following alternatives: (a) decrease CPI of FP SQRT to 2 or (b) decrease avg. CPI of all FP ops to 2.5. Which is best?

Measurement Value % of FP operations 25% Avg. CPI of FP operations 4.0 Avg. CPI of other instructions 1.33 % of FP SQRT 2% CPI of FP SQRT 20

Page 48: Chapter 1 Fundamentals of Quantitative Design and Analysismenasce/cs465/slides/CAQA5e_ch1.pdf · Classes of Computers Personal Mobile Device (PMD) e.g. smart phones, tablet computers

48 Copyright © 2012, Elsevier Inc. All rights reserved.

Processor Equations: Example P

rinciples

  Original CPI:   CPI for enhanced FP SQRT:

  CPI for enhanced FP:

Measurement Value % of FP operations 25% Avg. CPI of FP operations 4.0 Avg. CPI of other instructions 1.33 % of FP SQRT 2% CPI of FP SQRT 20

CPIorig = 0.25 × 4 + 0.75 ×1.33 = 2.0

CPInew FPSQRT = 2.0 − 0.02 × 20 + 0.02 × 2 =1.64

CPInew FP = 2.0 − 0.25 × 4 + 0.25 × 2.5 =1.625

Page 49: Chapter 1 Fundamentals of Quantitative Design and Analysismenasce/cs465/slides/CAQA5e_ch1.pdf · Classes of Computers Personal Mobile Device (PMD) e.g. smart phones, tablet computers

49 Copyright © 2012, Elsevier Inc. All rights reserved.

Processor Equations: Example P

rinciples

  Original CPI:   CPI for enhanced FP:

  Speedup:

  Most modern processors have counters for instructions executed and for clock cycles.

CPIorig = 0.25 × 4 + 0.75 ×1.33 = 2.0

CPInew FP = 2.0 − 0.25 × 4 + 0.25 × 2.5 =1.625

Speedup =IC ×CPIorig

IC ×CPInew FP

=2.0

1.625=1.23

Page 50: Chapter 1 Fundamentals of Quantitative Design and Analysismenasce/cs465/slides/CAQA5e_ch1.pdf · Classes of Computers Personal Mobile Device (PMD) e.g. smart phones, tablet computers

50 Copyright © 2012, Elsevier Inc. All rights reserved.

Falacies and Pitfalls P

rinciples

  Fallacy: Multiprocessors are a silver bullet.   Improve performance by replacing high-clock

and inefficient core with several lower-clock-rate efficient cores.

  Real performance improvement burden is now shifted to programmers.

  Pitfall: Falling prey to Amdahl’s Law.   Should check overall improvement using

Amdahl’s law before spending effort on enhancement.

Page 51: Chapter 1 Fundamentals of Quantitative Design and Analysismenasce/cs465/slides/CAQA5e_ch1.pdf · Classes of Computers Personal Mobile Device (PMD) e.g. smart phones, tablet computers

51 Copyright © 2012, Elsevier Inc. All rights reserved.

Falacies and Pitfalls (cont’d) P

rinciples

  Pitfall: A single point of failure.   Dependability is as strong as the weakest link.

  Fallacy: Hardware enhancements that increase performance improve energy efficiency or are at worst energy neutral   Running SPEC2006 on an Intel Core i7 n

Turbo mode showed a 7% performance increase at a cost of 37% more energy and 47% more power.

Page 52: Chapter 1 Fundamentals of Quantitative Design and Analysismenasce/cs465/slides/CAQA5e_ch1.pdf · Classes of Computers Personal Mobile Device (PMD) e.g. smart phones, tablet computers

52 Copyright © 2012, Elsevier Inc. All rights reserved.

Falacies and Pitfalls (cont’d) P

rinciples   Fallacy: Benchmarks remain valid indefinitely.   There is a tendency to optimize performance

for standard benchmarks.   Fallacy: The rated MTTF of disks is

1,200,000 hours (almost 140 years). So, disks practically never fail.   These numbers are exaggerated due to

manufacturer misleading testing processes. Real-world MTTF is about 2 to 10 times worse than manufacturer’s MTTF.

Page 53: Chapter 1 Fundamentals of Quantitative Design and Analysismenasce/cs465/slides/CAQA5e_ch1.pdf · Classes of Computers Personal Mobile Device (PMD) e.g. smart phones, tablet computers

53 Copyright © 2012, Elsevier Inc. All rights reserved.

Falacies and Pitfalls (cont’d) P

rinciples   Fallacy: Peak performance tracks observed performance.   Observed performance as a fraction of peak

performance varies significantly (from 5% to 60%) by benchmark and computer.

  Pitfall: Fault detection can lower availability.   A relatively high percentage of a system’s

components can fail without affecting its correct operation. If the system’s operation is interrupted when these faults are detected, availability is reduced.