Fred Brookscomarc/slides/lect1-6.pdf · 2006-04-02 · 10-6m ° 100 - 400M transistors ° (25 -...

15
“Architecture”?? “The art or science or building...the art or practice of designing and building structures...” » Webster 9th New College Dictionary “including plan, design, construction and decorative treatment...” » American College Dictionary “Computer Architecture” - the word coined by Fred Brooks “Computer architecture is the computer as seen by the user” - Amdhal et al, (64) “...by architecture, we mean the structure of the modules as they are organized in a computer system...” - Stone, H. (1987) “The architecture of a computer is the interface between the machine and the software” - Andris Padges IBM 360/370 Architect “Computer Architecture” “Computer Architecture” ° Structure: static arrangement of the parts (plan) ° Organization: dynamic interaction of these parts and their management (design) ° Implementation: the design of specific building blocks (construction) “Computer Architecture” – cont’d “Computer Architecture” – cont’d

Transcript of Fred Brookscomarc/slides/lect1-6.pdf · 2006-04-02 · 10-6m ° 100 - 400M transistors ° (25 -...

Page 1: Fred Brookscomarc/slides/lect1-6.pdf · 2006-04-02 · 10-6m ° 100 - 400M transistors ° (25 - 100M “logic gates") ° 3 - 10 conductive layers ° “CMOS” (complementary metal

מבנה המחשב “Architecture”??

“The art or science or building...the art or practic e of designing and building structures...”

» Webster 9th New College Dictionary

“including plan, design, construction and decorative treatment...”

» American College Dictionary

“Computer Architecture”

- the word coined byFred Brooks

“Computer architecture is the computer as seen by t he user”- Amdhal et al, (64)

“...by architecture, we mean the structure of the m odules as they are organized in a computer system...”

- Stone, H. (1987)

“The architecture of a computer is the interface be tween the machine and the software”

- Andris PadgesIBM 360/370 Architect

“Computer Architecture”“Computer Architecture”

° Structure: static arrangement of the

parts (plan)

° Organization: dynamic interaction of

these parts and their

management (design)

° Implementation: the design of

specific building

blocks (construction)

“Computer Architecture” – cont’d“Computer Architecture” – cont’d

Page 2: Fred Brookscomarc/slides/lect1-6.pdf · 2006-04-02 · 10-6m ° 100 - 400M transistors ° (25 - 100M “logic gates") ° 3 - 10 conductive layers ° “CMOS” (complementary metal

Architecture (from architect’s point of view)

° Instruction set architecture

° Implementation• Organization: high-level aspects

- memory system- bus structure

- internal CPU design

• Hardware:- logic design

- packaging tech .

Levels in Computer Organization

° Concepts of multi-level machine

° Concepts of virtual machine

Architecture Disciplines

° Hardware/software structure

° Algorithms and their implementation

° Language Issues

Both hardware and software consist of hierarchical layers, with each lower layer hiding details from the level above. This principle of abstraction is the way both hardware designers and software designers cope with the complexity of computer systems. One key interface between the levels of abstraction is the instruction set architecture: the interface between the hardware and low-level software. This abstract interface enables many implementations of varying cost and performance to run identical software.

John L. Hennessy

David A. Patterson

The Big Picture

Early Calculating Machines

° 1623: Wilhelm Schickard’smechanical counter.

° 1642: BlaisePascal’s mechanical adder with carry.

Page 3: Fred Brookscomarc/slides/lect1-6.pdf · 2006-04-02 · 10-6m ° 100 - 400M transistors ° (25 - 100M “logic gates") ° 3 - 10 conductive layers ° “CMOS” (complementary metal

Early Computing Machines

1823-42: Charles Babbage built the Difference Engine – to tabulate polynomial functions for math tables, with plans for a more general “Analytical Engine”

(assisted by Augusta AdaKing, Countess of Lovelace)

First Electronic Computers

° Konrad Zuse (1938) – Z1 mechanical computer with binar y arithmetic (program-controlled Z3 in 1941)

° John Atanasoff (1942) – “ABC” electronic digital comp uter to solve linear equations

° John Mauchly/J. Presper Eckert – ENIAC (1943-46) – fir st operational large-scale computing machine,

° Maurice Wilkes – EDSAC (1949) – 1st operational store d-program computer

° Howard Aiken – Harvard Mark I (1939-44) – built by IB M

° Jon Von Neumann/Eckert/Mauchly – EDVAC (1945) – 1st “published” stored-program computer,

° Von Neumann (1945-51) – IAS Computer

° Mauchly/Eckert (1946-51) - UNIVAC

First General-Purpose Computer° Electronic Numerical Integrator

and Calculator (ENIAC) built in World War II was the first general purpose computer• For computing artillery firing

tables• 80 feet long by 8.5 feet high

and several feet wide• Twenty 10 digit accumulators,

each 2 feet long• 18,000 vacuum tubes + 1500

relays• 5,000 additions/second• 2800 us multiply• Weight: 30 tons• Power consumption: 140kW• Data from card reader (800

cards/min)

© 2004 Morgan Kaufman Publishers

The Atanasoff Story

° John Vincent Atanasoff, a professor of physics at Iowa State College (now University), and his technical assistant, Clifford Berry, built a working electronic computer in 1942.

° The First Electronic Computer, the AtanasoffStory, by Alice R. Burks and Arthur W. Burks, Ann Arbor, Michigan: The University of Michigan Press, 1991.

History Continues

° 1946-52: Von Neumann built the IAS computer at the Institute of Advanced Studies, Princeton – A prototype for most future computers.

° 1947-50: Eckert-Mauchly Computer Corp. built UNIVAC I, used in the 1950 census.

° 1949: Maurice Wilkes built EDSAC, the first stored-program computer

EDVAC – Electronic Discrete Variable Computer (1945)

° Jon von Neumann

° First published stored-program computer (program & data in same memory - “von Neumann architecture”)

° 1024 words mercury delay-line memory, 20K words mag netic wire secondary memory

° 44-bit binary numbers & serial arithmetic

° Instruction format: A1 A2 A3 A4 OP

• A1 OP A2 -> A3, next instruction at A4

• Cond. Jump: if A1 <= A2 goto A3 else goto A4

° I/O between main & secondary memory

Page 4: Fred Brookscomarc/slides/lect1-6.pdf · 2006-04-02 · 10-6m ° 100 - 400M transistors ° (25 - 100M “logic gates") ° 3 - 10 conductive layers ° “CMOS” (complementary metal

1. “Big Iron” Computers:

Used vacuum tubes, electric relays and bulk magnetic storage devices. No microprocessors. No memory.

Example: ENIAC (1945), IBM Mark 1 (1944)

First-Generation Computers

° Late 1940s and 1950s

° Stored-program computers

° Programmed in assembly language

° Used magnetic devices and earlier forms of memories

° Examples: IAS, ENIAC, EDVAC, UNIVAC, Mark I, IBM 701

A PuzzlenWhat does the following mean?• 00000000001000100100000000100000• 00000000011001000100100000100000

• 00000001000010010010100000100010nOK, then, this?• 000000 00001 00010 01000 00000

100000• 000000 00011 00100 01001 00000

100000• 000000 01000 01001 00101 00000

100010

TranslationnAnd this?• 0 1 2 8 0 32• 0 3 4 9 0 32

• 0 8 9 5 0 34nHow about this?• add $8,$1,$2• add $9,$3,$4• sub $5,$8,$9

More TranslationnBecoming clear?• $8 = $1 + $2

• $9 = $3 + $4• $5 = $8 - $9nSurely OK now?• u = a +b• v = c+d;• x = u - vnOr, obviously: x = (a+b) - (c+d)

Page 5: Fred Brookscomarc/slides/lect1-6.pdf · 2006-04-02 · 10-6m ° 100 - 400M transistors ° (25 - 100M “logic gates") ° 3 - 10 conductive layers ° “CMOS” (complementary metal

Levels of Representation

High Level Language Program (e.g., C)

Assembly Language Program (e.g.,MIPS)

Machine Language Program (MIPS)

Hardware Architecture Description (e.g., Verilog Language)

Compiler

Assembler

Machine Interpretation

temp = v[k];

v[k] = v[k+1];v[k+1] = temp;

lw $t0, 0($2)lw $t1, 4($2)sw $t1, 0($2)sw $t0, 4($2)

0000 1001 1100 0110 1010 1111 0101 10001010 1111 0101 1000 0000 1001 1100 0110 1100 0110 1010 1111 0101 1000 0000 1001 0101 1000 0000 1001 1100 0110 1010 1111

Logic Circuit Description (Verilog Language)

Architecture Implementation

wire [31:0] dataBus;regFile registers (databus);

ALU ALUBlock (inA, inB, databus);

wire w0;XOR (w0, a, b);

AND (s, w0, a);

Computer Architecture

What are “Machine Structures”?

*Coordination of many

levels (layers) of abstraction

I/O systemProcessor

CompilerOperating

System(Mac OS X)

Application (ex: browser)

Digital DesignCircuit Design

Instruction SetArchitecture

Datapath & Control

transistors

MemoryHardware

Software Assembler

Anatomy: 5 components of any Computer

Personal Computer

Processor

Computer

Control(“brain”)

Datapath(“brawn”)

Memory

(where programs, data live whenrunning)

Devices

Input

Output

Keyboard, Mouse

Display , Printer

Disk(where programs, data live whennot running)

Overview of Physical implementations

° Integrated Circuits (ICs)• Combinational logic circuits, memory elements,

analog interfaces.

° Printed Circuits (PC) boards• substrate for ICs and interconnection, distribution of

CLK, Vdd, and GND signals, heat dissipation.

° Power Supplies• Converts line AC voltage to regulated DC low voltag e

levels.

° Chassis (rack, card case, ...) • holds boards, power supply, provides physical

interface to user or other systems.

° Connectors and Cables.

The hardware out of which we make systems.

Integrated Circuits

° Primarily Crystalline Silicon

° 1mm - 25mm on a side

° 2003 - feature size ~ 0.13µm = 0.13 x 10-6 m

° 100 - 400M transistors

° (25 - 100M “logic gates")

° 3 - 10 conductive layers

° “CMOS” (complementary metal oxide semiconductor) - most common.

° Package provides:• spreading of chip-level signal paths to

board-level

• heat dissipation.

° Ceramic or plastic with gold wires.

Chip in Package

Bare Die

Page 6: Fred Brookscomarc/slides/lect1-6.pdf · 2006-04-02 · 10-6m ° 100 - 400M transistors ° (25 - 100M “logic gates") ° 3 - 10 conductive layers ° “CMOS” (complementary metal

Printed Circuit Boards

° fiberglass or ceramic

° 1-20 conductive layers

° 1-20in on a side

° IC packages are soldered down.

Technology Trends: Memory Capacity(Single-Chip DRAM)

size

Year

1000

10000

100000

1000000

10000000

100000000

1000000000

1970 1975 1980 1985 1990 1995 2000

year size (Mbit) 1980 0.06251983 0.251986 11989 41992 161996 641998 1282000 2562002 512• Now 1.4X/yr, or 2X every 2 years.

• 8000X since 1980!

Year

1000

10000

100000

1000000

10000000

100000000

1970 1975 1980 1985 1990 1995 2000

i80386

i4004

i8080

Pentium

i80486

i80286

i8086

Technology Trends: Microprocessor Complexity

2X transistors/ChipEvery 1.5 years

Called“ Moore’s Law ”

Alpha 21264: 15 millionPentium Pro: 5.5 millionPowerPC 620: 6.9 millionAlpha 21164: 9.3 millionSparc Ultra: 5.2 million

Moore’s Law

Athlon (K7): 22 Million

Itanium 2: 410 Million

Trends: Processor Performance

0100200300400500600700800900

87 88 89 90 91 92 93 94 95 96 97

Intel (Pentium IV, 3.0 GHz)

DEC Alpha 5/500

DEC Alpha 5/300

DEC Alpha 4/266IBM POWER 100

DEC AXP/500

HP 9000/750

Sun-4/

260

IBMRS/

6000

MIPS M/

120

MIPS M

2000

1.54x/year

10001100

°Performance with respect to performance of VAX-11/780

Processor Performance (SPEC)

0

50

100

150

200

250

300

350

1982 1984 1986 1988 1990 1992 1994

Year

Pe

rform

ance

RISC

Intel x86

35%/yr

RISCintroduction

Did RISC win the technology battle and lose the market war?

performance now improves ~60% per year (2x every 1. 5 years)

OLD PICTURE – BUT THESTORY IS THE SAME

Processor Performance - Capacities

Page 7: Fred Brookscomarc/slides/lect1-6.pdf · 2006-04-02 · 10-6m ° 100 - 400M transistors ° (25 - 100M “logic gates") ° 3 - 10 conductive layers ° “CMOS” (complementary metal

Processor Performance - Capacities Technology --> Dramatic Changes° Processor

• logic capacity: 2 ×××× in performance every 1.5 years; • clock rate: about 30% per year• overall performance: 1000 ×××× in last decade

° Main Memory• DRAM capacity: 2 ×××× / 2 years; 1000 ×××× size in last decade

• memory speed: about 10% per year

• cost / bit: improves about 25% per year

° Disk• capacity: > 2 ×××× in capacity every 1.5 years• cost / bit: improves about 60% per year• 120 ×××× capacity in last decade

° Network Bandwidth• Bandwidth: increasing more than 100%per year!

Your PC in 2006°State-of-the-art PC(on your desk)

• Processor clock speed: 8000 MegaHertz (8.0 GigaHertz)

• Memory capacity: 2048 MegaBytes (2.0 GigaBytes)

• Disk capacity:800 GigaBytes (0.8 TeraBytes)

• Will need new units! Mega ⇒ Giga ⇒ Tera

Technology in the News

° BIG• LaCie the first to offer

consumer-level 1.6 Terabyte disk!

• ~$2,000• Weighs 11 pounds!

• 5 1/4” form-factor

° SMALL• Pretec is soon

offering a 12GBCompactFlash card

• Size of a silver dollar

• Cost? > New Honda!

www.lacie.com/products/product.htm?id=10129

www.engadget.com/entry/4463693158281236/

Page 8: Fred Brookscomarc/slides/lect1-6.pdf · 2006-04-02 · 10-6m ° 100 - 400M transistors ° (25 - 100M “logic gates") ° 3 - 10 conductive layers ° “CMOS” (complementary metal

° Learn some of the big ideas in CS & engineering:• 5 Classic components of a Computer

• Data can be anything (integers, floating point, characters): a program determines what it is

• Stored program concept: instructions just data

• Principle of Locality, exploited via a memory hiera rchy (cache)

• Greater performance by exploiting parallelism

• Principle of abstraction, used to build systems as layers

• Compilation v. interpretation thru system layers

• Principles/Pitfalls of Performance Measurement

Text°Computer Organization and Design: The Hardware/Software Interface, Third Edition, Patterson and Hennessy (COD). The second edition is far inferior, and is not suggested.

Your final grade

° Grading (could change)• 25% Homework• 75% Test

Course Problems…Cheating

°What is cheating?• Studying together in groups is encouraged.

• Turned-in work must be completely your own.

• Both “giver” and “receiver” are equally culpable

°Every offense will be referred to theOffice of Student Judicial Affairs.

°Continued rapid improvement in computing• 2X every 2.0 years in memory size;

every 1.5 years in processor speed; every 1.0 year in disk capacity;

• Moore’s Law enables processor(2X transistors/chip ~1.5 yrs)

°5 classic components of all computersControl Datapath Memory Input Output

Processor

}What is "Computer Architecture"

Computer Architecture =

Instruction Set Architecture (ISA) +

Machine Organization (MO)

• ISA \ Definition of What the Machine Does, Logical View

• MO \ How Machine Implements ISA, Physical Implementation

Page 9: Fred Brookscomarc/slides/lect1-6.pdf · 2006-04-02 · 10-6m ° 100 - 400M transistors ° (25 - 100M “logic gates") ° 3 - 10 conductive layers ° “CMOS” (complementary metal

The Instruction Set: a (the?) Critical Interface

instruction set

software

hardware

Example ISAs(Instruction Set Architectures)

°Digital Alpha (v1, v3) 1992-97

°HP PA-RISC (v1.1, v2.0) 1986-01

°Sun Sparc (v8, v9, v10, v11) 1987-01

°SGI MIPS (MIPS I, II, III, IV, V) 1986-01

°Intel (8086,80286, 1978-01 80486,Pentium, Pentium )

°Intel + HP EPIC 1998-01

Impact of changing an ISA

°Early 1990’s Apple switched instruction set architecture of the Macintosh• From Motorola 68000-based machines• To PowerPC architecture• Upside? Downside?

°Intel 80x86 Family: many implementations of same architecture• Upside: program written in 1978 for 8086 can be run on latest Pentium chip

• Downside?

The Big Picture

Control

Datapath

Memory

Processor``CPU''Datapath

+ControlUnit

Input

Output

Since 1946 all computers have had 5 components

Interconnection Structures (buses)

What is ``Computer Architecture''?

I/O systemProcessor

CompilerOperating System

(Unix; Windows 2000)

Application (Netscape)

Digital DesignCircuit Design

Instruction SetArchitecture

Datapath & Control

transistors, IC layout

MemoryHardwareSoftware Assembler

° Co-ordination of many levels of abstraction• hide unnecessary implementation details• helps us cope with enormous complexity of

real systems

° Under a rapidly changing set of forces

° Design, Measurement, and Evaluation

Forces Acting on Computer Architecture

° R-a-p-i-d Improvement in Implementation Technology:

• IC: integrated circuit; invented 1959

• SSI →→→→ MSI →→→→ LSI →→→→ VLSI: dramatic growth in number transistors/chip ⇒ ability to create more (and bigger) Functional Units per processor;

• bigger memory ⇒ more sophisticated applications, larger databases

• Ubiquitous computing

Page 10: Fred Brookscomarc/slides/lect1-6.pdf · 2006-04-02 · 10-6m ° 100 - 400M transistors ° (25 - 100M “logic gates") ° 3 - 10 conductive layers ° “CMOS” (complementary metal

Execution Cycle

InstructionFetch

InstructionDecode

OperandFetch

Execute

ResultStore

NextInstruction

Obtain instruction from program storage

Determine required actions and instruction size

Locate and obtain operand data

Compute result value or status

Deposit results in storage for later use

Determine successor instruction

Overview: Processor

Front SideBus

These piecesimplementthe instruction cycles

Overview: PCI Bus and Devices

Bus Controller

° All computers consist of five components

• Processor :

• (1) datapath and (2) control

• (3) Memory

• (4) Input devices and (5) Output devices

° Not all "memories" are created equally

• Cache: fast (expensive) memory are placed closer to the processor

• Main memory : less expensive memory--we can have more

° Interfaces are where the problems are - between functional units and between the computer and the outside world

° Need to design against constraints of performance, power, area and cost

DAP.S98 1

IC cost = Die cost + Testing cost + Packa ging costFinal test yield

Die cost = Wafer costDies per Wafer * Die yield

Dies per wafer = š * ( Wafer_diam / 2) 2 – š * Wafer_diam – Test diesDie Area ¦ 2 * Die Area

Die Yield = Wafer yield * 1 +Defects_per_unit_area * Die_Area

αααα

Integrated Circuits Costs

Die Cost goes roughly with die area 4

{−−−− αααα

}

How to Quantify Performance?

• Time to run the task (ExTime)– Execution time, response time, latency

• Tasks per day, hour, week, sec, ns … (Performance)– Throughput, bandwidth

Plane

Boeing 747

BAD/SudConcodre

Speed

610 mph

1350 mph

DC to Paris

6.5 hours

3 hours

Passengers

470

132

Throughput (pmph)

286,700

178,200

Page 11: Fred Brookscomarc/slides/lect1-6.pdf · 2006-04-02 · 10-6m ° 100 - 400M transistors ° (25 - 100M “logic gates") ° 3 - 10 conductive layers ° “CMOS” (complementary metal

The Bottom Line: Performance and Cost or Cost and Performance?

"X is n times faster than Y" means

ExTime(Y) Performance(X)

--------- = ---------------

ExTime(X) Performance(Y)

• Speed of Concorde vs. Boeing 747• Throughput of Boeing 747 vs. Concorde• Cost is also an important parameter in the

equation which is why concordes are being put to pasture!

Measurement Tools

° Benchmarks, Traces, Mixes

° Hardware: Cost, delay, area, power estimation

° Simulation (many levels)• ISA, RT, Gate, Circuit

° Queuing Theory

° Rules of Thumb

° Fundamental “Laws”/Principles

° Understanding the limitations of any measurement tool is crucial.

Metrics of Performance

Compiler

Programming Language

Application

DatapathControl

Transistors Wires Pins

ISA

Function Units

(millions) of Instructions per second: MIPS(millions) of (FP) operations per second: MFLOP/s

Cycles per second (clock rate)

Megabytes per second

Answers per monthOperations per second

Cases of Benchmark Engineering

° The motivation is to tune the system to the benchma rk to achievepeak performance.

° At the architecture level

• Specialized instructions

° At the compiler level (compiler flags )

• Blocking in Spec89 � factor of 9 speedup

• Incorrect compiler optimizations/reordering.

• Would work fine on benchmark but not on other progr ams

° I/O level

• Spec92 spreadsheet program (sp)

• Companies noticed that the produced output was alwa ys out put to a file (so they stored the results in a memo ry buffer) and then expunged at the end (which was not measured).

• One company eliminated the I/O all together.

After putting in a blazing performance on the benchmark test, Sun issued a glowing press release claiming that it hadoutperformed Windows NT systems on the test. Pendragon president Ivan Phillips cried foul, saying the resultsweren't representative of real-world Java performance and that Sun had gone so far as to duplicate the test's code within Sun'sJust-In-Time compiler. That's cheating, says Phillips, who claims that benchmark tests and real-world applications aren'tthe same thing.

Did Sun issue a denial or a mea culpa? Initially, Sun neither denied optimizing for the benchmark test nor apologized forit. "If the test results are not representative of real-world Java applications, then that's a problem with the benchmark,"Sun's Brian Croll said.

After taking a beating in the press, though, Sun retreated and issued an apology for the optimization.[Excerpted from PC Online 1997]

Issues with Benchmark Engineering

° Motivated by the bottom dollar, good performance on classic suites �more customers, better sales.

° Benchmark Engineering � Limits the longevity of benchmark suites

° Technology and Applications � Limits the longevity of benchmark suites.

Page 12: Fred Brookscomarc/slides/lect1-6.pdf · 2006-04-02 · 10-6m ° 100 - 400M transistors ° (25 - 100M “logic gates") ° 3 - 10 conductive layers ° “CMOS” (complementary metal

°http://www.spec.org/SPEC: System Performance Evaluation Cooperative

° First Round 1989

• 10 programs yielding a single number (“SPECmarks”)

° Second Round 1992

• SPECInt92 (6 integer programs) and SPECfp92 (14 floating point programs)

- Compiler Flags unlimited. March 93

- new set of programs: SPECint95 (8 integer programs) and SPECfp95 (10 floating point)

• “benchmarks useful for 3 years”

• Single flag setting for all programs: SPECint_base9 5, SPECfp_base95

• SPEC CPU2000 (11 integer benchmarks – CINT2000, and 14 floating-point benchmarks – CFP2000

SPEC 2000 (CINT 2000)Results SPEC 2000 (CFP 2000)Results

Reporting Performance Results

° Reproducability

° � Apply them on publicly available benchmarks. Pecking/Picking order• Real Programs• Real Kernels• Toy Benchmarks

• Synthetic Benchmarks

How to Summarize Performance

° Arithmetic mean (weighted arithmetic mean) tracks execution time: sum(Ti)/n or sum(W i*Ti)

° Harmonic mean (weighted harmonic mean) of rates (e.g., MFLOPS) tracks execution time: n/sum(1/Ri) or 1/sum(Wi/Ri)

° Normalized execution time is handy for scaling performance (e.g., X times faster than SPARCstation 10)

° But do not take the arithmetic mean of normalized execution time, use the geometric mean = (Product(Ri)^1/n)

Page 13: Fred Brookscomarc/slides/lect1-6.pdf · 2006-04-02 · 10-6m ° 100 - 400M transistors ° (25 - 100M “logic gates") ° 3 - 10 conductive layers ° “CMOS” (complementary metal

Performance Evaluation° “For better or worse, benchmarks shape a field”

° Good products created when have:

• Good benchmarks

• Good ways to summarize performance

° Given sales is a function in part of performance re lative to competition, investment in improving product as rep orted by performance summary

° If benchmarks/summary inadequate, then choose betwe en improving product for real programs vs. improving p roduct to get more sales;Sales almost always wins!

° Execution time is the measure of computer performance!

Simulations

° When are simulations useful?

° What are its limitations, I.e. what real world phenomenon does it not account for?

° The larger the simulation trace, the less tractable the post-processing analysis.

Queueing Theory

° What are the distributions of arrival rates and values for other parameters?

° Are they realistic?

° What happens when the parameters or distributions are changed?

Quantitative Principles of Computer Design

° Make the Common Case Fast• Amdahl’s Law

° CPU Performance Equation• Clock cycle time

• CPI

• Instruction Count

° Principles of Locality

° Take advantage of Parallelism

CPU Performance Equation

CPU time = Seconds = Instructions x Cycles x Seconds

Program Program Instruction Cycle

CPU time = Seconds = Instructions x Cycles x Seconds

Program Program Instruction Cycle

DAP.S98 32

Amdahl's LawSpeedup due to enhancement E:

ExTime w/o E Performance w/ E

Speedup(E) = ------------- = -------------------

ExTime w/ E Performance w/o E

Suppose that enhancement E accelerates a fraction F of the task by a factor S, and the remainder of the task is unaffected

Page 14: Fred Brookscomarc/slides/lect1-6.pdf · 2006-04-02 · 10-6m ° 100 - 400M transistors ° (25 - 100M “logic gates") ° 3 - 10 conductive layers ° “CMOS” (complementary metal

Amdahl’s Law

ExTime new = ExTime old x (1 - Fraction enhanced ) + Fraction enhanced

Speedup overall =ExTime old

ExTime new

Speedup enhanced

=1

(1 - Fraction enhanced ) + Fraction enhanced

Speedup enhanced

Amdahl’s Law

° Floating point instructions improved to run 2X; but only 10% of actual instructions are FP

=

Amdahl’s Law (answer)

° Floating point instructions improved to run 2X; but only 10% of actual instructions are FP

Speedup overall = 1

0.95= 1.053

ExTime new = ExTime old x (0.9 + .1/2) = 0.95 x ExTime old

Example: Calculating CPI

Typical Mix

Base Machine (Reg / Reg)Op Freq Cycles CPI(i) (% Time)ALU 50% 1 .5 (33%)Load 20% 2 .4 (27%)Store 10% 2 .2 (13%)Branch 20% 2 .4 (27%)

1.5

Chapter Summary, #1

• Designing to Last through TrendsCapacity Speed

Logic 2x in 3 years 2x in 3 years

DRAM 4x in 3 years 2x in 10 years

Disk 4x in 3 years 2x in 10 years

• 6yrs to graduate => 16X CPU speed, DRAM/Disk size

• Time to run the task– Execution time, response time, latency

• Tasks per day, hour, week, sec, ns, …– Throughput, bandwidth

• “X is n times faster than Y” meansExTime(Y) Performance(X)

--------- = --------------

ExTime(X) Performance(Y)

° Amdahl’s Law:

° CPI Law:

° Execution time is the REAL measure of computer performance!

° Good products created when have:

• Good benchmarks, good ways to summarize performance

° Die Cost goes roughly with die area 4

Speedup overall =ExTime old

ExTime new

=1

(1 - Fraction enhanced ) + Fraction enhanced

Speedup enhanced

CPU time = Seconds = Instructions x Cycles x Seconds

Program Program Instruction Cycle

CPU time = Seconds = Instructions x Cycles x Seconds

Program Program Instruction Cycle

Page 15: Fred Brookscomarc/slides/lect1-6.pdf · 2006-04-02 · 10-6m ° 100 - 400M transistors ° (25 - 100M “logic gates") ° 3 - 10 conductive layers ° “CMOS” (complementary metal

Food for thought

° Two companies reports results on two benchmarks one on a Fortran benchmark suite and the other on a C++ benchmark suite.

° Company A’s product outperforms Company B’s on the Fortran suite, the reverse holds true for the C++ suite. Assume the performance differences are similar in both cases.

° Do you have enough information to compare the two products. What information will you need?

Food for Thought II

° In the CISC vs. RISC debate a key argument of the RISC movement was that because of its simplicity, RISC would always remain ahead.

° If there were enough transistors to implement a CISC on chip, then those same transistors could implement a pipelined RISC

° If there was enough to allow for a pipelined CISC there would be enough to have an on-chip cache for RISC. And so on.

° After 20 years of this debate what do you think?

° Hint: Think of commercial PC’s, Moore’s law and some of the data in the first chapter of the book ( and on these slides)