CPE 626: Advanced VLSI Design L01

23
CPE 626: Advanced VLSI Design L01 Department of Electrical and Computer Engineering University of Alabama in Huntsville

description

CPE 626: Advanced VLSI Design L01. Department of Electrical and Computer Engineering University of Alabama in Huntsville. Outline. Computer Engineering: Motivation, Present, Future Computer Engineering Methodology Power as a Design Constraint Stored-program Computer: MU0 Example - PowerPoint PPT Presentation

Transcript of CPE 626: Advanced VLSI Design L01

Page 1: CPE 626: Advanced VLSI Design L01

CPE 626: Advanced VLSI DesignL01

Department of Electrical and Computer Engineering University of Alabama in Huntsville

Page 2: CPE 626: Advanced VLSI Design L01

2

Outline

Computer Engineering: Motivation, Present, Future

Computer Engineering MethodologyPower as a Design ConstraintStored-program Computer: MU0 ExampleDigital System Modeling: Motivation

Page 3: CPE 626: Advanced VLSI Design L01

3

Why Computer Engineering?

Eniac, 1946 (first stored-program computer)

PC, 2002

PDA, 2002

CHANGE! It is exciting. It has never been more exciting!It impacts every aspect of human life.

Bionic, 2002

Page 4: CPE 626: Advanced VLSI Design L01

4

Why Such Change?

Continuous growth in performance due to advances in technology (CMOS VLSI) and innovations in computer design (RISC, RAID, ILP)

Lower cost due to simpler development and higher volumes

These resulted in significant enhancement of the capability available to computer user Example: our today’s PC of less than $1000

has more performance, main memory and disk storage than $1 million computer in 1970s

Page 5: CPE 626: Advanced VLSI Design L01

5

Computer Engineering Methodology

Evaluate ExistingEvaluate ExistingSystems for Systems for BottlenecksBottlenecks

Simulate NewSimulate NewDesigns andDesigns and

OrganizationsOrganizations

Implement NextImplement NextGeneration SystemGeneration System

TechnologyTrends

Benchmarks

Workloads

ImplementationComplexity

ApplicationsMarket

Page 6: CPE 626: Advanced VLSI Design L01

6

Technology Trends

Capacity Speed/Latency

Logic 4x in 3 years 1.54x per year

DRAM 4x in 3-4 years 2x in 10 years

Disk 4x in 3-4 years 2x in 10 years

Reuters, Monday 11 June 2001:Intel engineers have designed and manufactured the world’s smallest and fastest transistor of 0.02 microns in size.

This will open the way for microprocessors of 1 billion transistors, running at 20 GHz by 2007.

State of the art:Intel Pentium 4,

2.2 GHz,0.13microns,

42 million transistors

Page 7: CPE 626: Advanced VLSI Design L01

7

Pentium III Die Photo

EBL/BBL - Bus logic, Front, Back MOB - Memory Order Buffer Packed FPU - MMX Fl. Pt. (SSE) IEU - Integer Execution Unit FAU - Fl. Pt. Arithmetic Unit MIU - Memory Interface Unit DCU - Data Cache Unit PMH - Page Miss Handler DTLB - Data TLB BAC - Branch Address Calculator RAT - Register Alias Table SIMD - Packed Fl. Pt. RS - Reservation Station BTB - Branch Target Buffer IFU - Instruction Fetch Unit (+I$) ID - Instruction Decode ROB - Reorder Buffer MS - Micro-instruction Sequencer1st Pentium III, Katmai: 9.5 M transistors, 12.3 * 10.4 mm in 0.25-mi. with 5 layers of

aluminum

Page 8: CPE 626: Advanced VLSI Design L01

8

Pentium 4 Die Photo

42M Xtors PIII: 26M

217 mm2 PIII: 106

mm2 L1 Execution

Cache Buffer

12,000 Micro-Ops

8KB data cache

256KB L2$

Page 9: CPE 626: Advanced VLSI Design L01

9

Future Applications

Desktop: 90% of cycles will be spent on media applications video encode/decode, polygon & image-based

graphics audio processing, compression, music,

speech recognition/synthesis modulation/demodulation at audio and video rates

Scientific desktops: high-performance FPs and graphics Commercial servers: support for databases and

transaction processing, enhancement for reliability, support for scalability

Embedded computing: special support for graphics or video, power limitations

Page 10: CPE 626: Advanced VLSI Design L01

10

Future Directions

Conditions new workloads are characterised

with more exploitable parallelism dominant wire delays on a billion transistor chip

will force hardware to be more distributed Novel architectural techniques

Exploit parallelismo multiprocessor on chip o simultaneous multithreading

CPU-memory integration o memory tolerating techniqueso flexible hierarchy to adapt to application

Reconfigurable computing

Develop architectural techniques that exploit semiconductor technology and workload characteristics in order to maximize performance at low cost

Page 11: CPE 626: Advanced VLSI Design L01

11

Power as a Design Constraint

Power becomes critical issue Portable and mobile platforms

battery-operated devices Desktops, server farms

Reliability? Power consumption: IT consumes 10% in the US Power density: 30 W/cm2 in Alpha 21364

(3x of typical hot plate)

Page 12: CPE 626: Advanced VLSI Design L01

12

Power as a Design Constraint (cont’d)

leakshort2 VIfAVIfACVP

Dynamic power consumption

A (activity of gates) =>

Turn off unused parts or use design techniques to minimize number of transitions

Power due to short-circuit current during transition

Power due to leakage current

V

)VV(f

2t

max

Reduce the supply voltage, V

)kT

qVexp(I t

leak

Reduce threshold Vt

Page 13: CPE 626: Advanced VLSI Design L01

13

Recap: Computer Architecture

Computer Architecture describes user’s view of the computer: visible registers, data types, instruction set, instruction formats, memory management table structures, exception handling

Computer Organization describes user’s invisible implementation of the architecture: pipeline structure, caches, TLB, ...

Page 14: CPE 626: Advanced VLSI Design L01

14

Stored-program computer

address

instructions

processor

memory

registers

instructions

data

00..0016

FF..FF16

and data

Page 15: CPE 626: Advanced VLSI Design L01

15

Typical Hierarchy

Transistors Logic gates, memory cells,

special circuits Single-bit adders, MUXs, flip-

flops, decoders, coders Word-wide adders, MUXs,

registers, decoders, buses ALUs, shifters, register files,

memory blocks Processor, peripheral cells,

cache memories, MMUs Integrated system chips PCBs Mobile phones, laptops, PCs,

engine controllers

Vdd

Vss

A

B

A.B

Page 16: CPE 626: Advanced VLSI Design L01

16

MU0 – A Simple Processor

Instruction format Instruction set

opcode S

12 bits4 bits

Instruction Opcode Effect

LDA S 0000 ACC := mem16[S]

STO S 0001 mem16[S] := ACC

ADD S 0010 ACC := ACC + mem16[S]

SUB S 0011 ACC := ACC - mem16[S]

JMP S 0100 PC := S

JGE S 0101 if ACC >= 0 PC := S

JNE S 0110 if ACC !=0 PC := S

STP 0111 stop

Page 17: CPE 626: Advanced VLSI Design L01

17

MU0 Datapath Example

Program Counter – PC Accumulator - ACC Arithmetic-Logic Unit – ALU

Instruction Register Instruction Decode and

Control Logic

IRPC

ACCALU

memory

control

address bus

data bus

Follow the principle that the memory will be limiting factor in design: each instruction takes exactly the number of clock cycles defined by the number of memory accesses it must take.

Page 18: CPE 626: Advanced VLSI Design L01

18

MU0 Datapath Design

Assume that each instruction starts when it has arrived in the IR

Step 1: EX (execute) LDA S: ACC <- Mem[S] STO S: Mem[S] <- ACC ADD S: ACC <- ACC + Mem[S] SUB S: ACC <- ACC - Mem[S] JMP S: PC <- S JGE S: if (ACC >= 0) PC <- S JNE S: if (ACC != 0) PC <- S

Step 2: IF (fetch the next instruction) Either PC or the address in the

IR is issued to fetch the next instruction

address is incremented in the ALU and value saved into the PC

Initialization Reset input to start

executing instructions from a known address; here it is 000hex

o provide zero at the ALU output and then load it into the PC register

Page 19: CPE 626: Advanced VLSI Design L01

19

MU0 RTL Organization

Control Logic Asel Bsel ACCce (ACC change

enable) PCce (PC change enable) IRce (IR change enable) ACCoe (ACC output

enable) ALUfs (ALU function

select) MEMrq (memory request) RnW (read/write) Ex/ft (execute/fetch)

memory

ACC

IRce

PCce

ALUfs

Bsel

ACCce

ACCoe

MEMrq RnW

mux0 1

Asel

ALUAB

PC

ACC[15]

ACCz

IR

opcode

MU0

Page 20: CPE 626: Advanced VLSI Design L01

20

MU0 control logic

Inputs Outputs Opco de Ex / f t ACC1 5 Bs e l PCce ACCo e MEMrq Ex / f tIns truct i o n Res et ACCz As el ACCce IRce ALUfs RnWReset xxxx 1 x x x 0 0 1 1 1 0 = 0 1 1 0LDA S 0000

000000

01

xx

xx

10

10

10

01

01

00

= BB+1

11

11

10

STO S 00010001

00

01

xx

xx

10

x0

00

01

01

10

xB+1

11

01

10

ADD S 00100010

00

01

xx

xx

10

10

10

01

01

00

A+BB+1

11

11

10

SUB S 00110011

00

01

xx

xx

10

10

10

01

01

00

A-BB+1

11

11

10

JMP S 0100 0 x x x 1 0 0 1 1 0 B+1 1 1 0JGE S 0101

010100

xx

xx

01

10

00

00

11

11

00

B+1B+1

11

11

00

JNE S 01100110

00

xx

01

xx

10

00

00

11

11

00

B+1B+1

11

11

00

STOP 0111 0 x x x 1 x 0 0 0 0 x 0 1 0

Page 21: CPE 626: Advanced VLSI Design L01

21

MU0 ALU Design

ALU functions: A+B, A-B, B, B+1, 0 (used only when reset is active) => 4 functions

Aen (enable operand A) Binv (invert operand B)

Cin

Cout

sum

reset

B

A

Aen

Binv

Page 22: CPE 626: Advanced VLSI Design L01

22

Digital System Modeling: Motivation

Requirements specification Functional specification Testing and verification of the design Formal verification of the correctness of the design Automatic synthesis

Page 23: CPE 626: Advanced VLSI Design L01

23

Gajski and Kuhn’s Y Chart

Physical/Geometry

StructuralBehavioral

Processor

Hardware Modules

ALUs, Registers

Gates, FFs

Transistors

Systems

Algorithms

Register TransferLogic

Transfer Functions

Architectural

Algorithmic

Functional Block

Logic

Circuit

Rectangles

Cell, Module Plans

Floor Plans

Clusters

Physical Partitions

Domains

Functional – operations performed by the system

Structural – how the system is composed

Geometry – how the system is laid out in physical space