Single Cycle Datapath

53
Single Cycle Datapath Lecture notes from MKP, H. H. Lee and S. Yalamanchili

description

Single Cycle Datapath. Lecture notes from MKP, H. H. Lee and S. Yalamanchili. Reading. Section 4.1-4.4 Appendices C.7, C.8, C.11, D.2 Practice Problems: 1, 4, 6, 9. Introduction. We will examine two MIPS implementations A simplified version  this module - PowerPoint PPT Presentation

Transcript of Single Cycle Datapath

Page 1: Single Cycle  Datapath

Single Cycle Datapath

Lecture notes from MKP, H. H. Lee and S. Yalamanchili

Page 2: Single Cycle  Datapath

(2)

Reading

• Section 4.1-4.4

• Appendices B.7, B.8, B.11, D.2

• Practice Problems: 1, 4, 6, 9

Page 3: Single Cycle  Datapath

(3)

Introduction

• We will examine two MIPS implementations A simplified version this module A more realistic pipelined version

• Simple subset, shows most aspects Memory reference: lw, sw Arithmetic/logical: add, sub, and, or, slt Control transfer: beq, j

Page 4: Single Cycle  Datapath

(4)

Instruction Execution

• PC instruction memory, fetch instruction

• Register numbers register file, read registers

• Depending on instruction class1. Use ALU to calculate

o Arithmetic resulto Memory address for load/storeo Branch target address

2. Access data memory for load/store3. PC An address or PC + 4

8d0b0000  

014b5020  21080004  2129ffff  1520fffc  000a082a  …..….. 

An Encoded Program

Address

Page 5: Single Cycle  Datapath

(5)

Basic Ingredients

• Include the functional units we need for each instruction – combinational and sequential

PC

Instructionmemory

Instructionaddress

Instruction

a. Instruction memory b. Program counter

Add Sum

c. Adder

16 32Sign

extend

b. Sign-extension unit

MemRead

MemWrite

Datamemory

Writedata

Readdata

a. Data memory unit

Address

ALU control

RegWrite

RegistersWriteregister

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Writedata

ALUresult

ALU

Data

Data

Registernumbers

a. Registers b. ALU

Zero5

5

5 3

Page 6: Single Cycle  Datapath

(6)

Sequential Elements (4.2, B.7, B.11)

• Register: stores data in a circuit Uses a clock signal to determine when to update the

stored value Edge-triggered: update when Clk changes from 0 to 1

D

Clk

Q

Clk

D

QQQ

_Q

Q

_Q

Dlatch

D

C

Dlatch

DD

C

C

falling edge rising edge

Page 7: Single Cycle  Datapath

(7)

Sequential Elements

• Register with write control Only updates on clock edge when write control input is 1 Used when stored value is required later

D

Clk

Q

Write

Write

D

Q

Clk

QQ

_Q

Q

_Q

Dlatch

D

C

Dlatch

DD

C

C

cycle time

Page 8: Single Cycle  Datapath

(8)

Clocking Methodology

• Combinational logic transforms data during clock cycles Between clock edges Input from state elements, output to state element Longest delay determines clock period

• Synchronous vs. Asynchronous operation

Recall: Critical Path Delay

Page 9: Single Cycle  Datapath

(9)

• Built using D flip-flops (remember ECE 2030!)

Register File (B.8)

Mux

Register 0

Register 1

Register n – 1

Register n

Mux

Read data 1

Read data 2

Read registernumber 1

Read registernumber 2

Read registernumber 1 Read

data 1

Readdata 2

Read registernumber 2

Register fileWriteregister

Writedata Write

Page 10: Single Cycle  Datapath

(10)

Register File

• Note: we still use the real clock to determine when to write

n-to-1decoder

Register 0

Register 1

Register n – 1C

C

D

DRegister n

C

C

D

D

Register number

Write

Register data

0

1

n – 1

n

Page 11: Single Cycle  Datapath

(11)

Building a Datapath (4.3)

• Datapath Elements that process data and addresses

in the CPUo Registers, ALUs, mux’s, memories, …

• We will build a MIPS datapath incrementally Refining the overview design

Page 12: Single Cycle  Datapath

(12)

High Level Description

• Single instruction single data stream model of execution (Remember Flynn’s Taxonomy) Serial execution model

• Commonly known as the von Neumann execution model Stored program model Instructions and data share memory

Fetch Instructions

Execute Instructions

Memory Operations

Control

SISD SIMD

MISD MIMD

Data Streams

Inst

ruct

ion S

tream

s

Page 13: Single Cycle  Datapath

(13)

Instruction Fetch

Increment by 4 for next instruction32-bit

register

clk

cycle timeStart instruction fetch Complete instruction fetch

clk

Page 14: Single Cycle  Datapath

(14)

R-Format Instructions

• Read two register operands• Perform arithmetic/logical operation• Write register result

op rs rt rd shamt funct

Page 15: Single Cycle  Datapath

(15)

Executing R-Format Instructions

ALU control

RegWrite

Writeregister

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Writedata

ALUresult

ALUZero

5

5

53

op rs rt rd shamt funct

Page 16: Single Cycle  Datapath

(16)

Load/Store Instructions• Read register operands• Calculate address using 16-bit offset

Use ALU, but sign-extend offset• Load: Read memory and update register• Store: Write register value to memory

op rs rt 16-bit constant

Page 17: Single Cycle  Datapath

(17)

Executing I-Format Instructions

16 32S ign

extendM e m R e a d

M e m W r it e

D a ta

m e m o r yW r i ted a ta

R e a dd a ta

A d d r e s s

RegWrite

Readregister 1

Readregister 2

Writeregister

op rs rt 16-bit constant

Page 18: Single Cycle  Datapath

(18)

Branch Instructions

• Read register operands

• Compare operands Use ALU, subtract and check Zero output

• Calculate target address Sign-extend displacement Shift left 2 places (word displacement) Add to PC + 4

o Already calculated by instruction fetch

op rs rt 16-bit constant

Page 19: Single Cycle  Datapath

(19)

Branch Instructions

Justre-routes

wires

Sign-bit wire replicated

op rs rt 16-bit constant

Page 20: Single Cycle  Datapath

(20)

Updating the Program Counter

PC

Instructionmemory

Readaddress

Instruction[31–0]

Instruction [20–16]

Instruction [25–21]

Add

4

16 32Instruction [15–0] Signextend

1

Mux

0

Instruction [15–11

Shift

Branch

Add ALUresult

Computation of the branch

address

loop: beq $t0, $0, exitaddi $t0, $t0, -1lw $a0, arg1($t1)lw $a1, arg2($t2)jal funcadd $t3, $t3, $v0addi $t1, $t1, 4addi $t2, $t2, 4j loop

Page 21: Single Cycle  Datapath

(21)

Composing the Elements• First-cut data path does an instruction in one

clock cycle Each datapath element can only do one function at a

time Hence, we need separate instruction and data

memories

• Use multiplexers where alternate data sources are used for different instructions

014b5020  21080004  2129ffff  1520fffc  000a082a  …..….. 

An Encoded Program

AddressPC

Page 22: Single Cycle  Datapath

(22)

Full Single Cycle Datapath

Destination register is “instruction-

specific”

lw$t0, 0($t4) vs. add $t0m $t1, $t2

Page 23: Single Cycle  Datapath

(23)

The Main Control Unit

• Control signals derived from instruction

0 rs rt rd shamt funct

31:26 5:025:21 20:16 15:11 10:6

35 or 43 rs rt address

31:26 25:21 20:16 15:0

4 rs rt address

31:26 25:21 20:16 15:0

R-type

Load/Store

Branch

opcode

always read

read, except for load

write for R-type

and load

sign-extend and add

Page 24: Single Cycle  Datapath

(24)

ALU Control (4.4, D.2)

• ALU used for Load/Store: Function = add Branch: Function = subtract R-type: Function depends on funct field

ALU control Function

0000 AND

0001 OR

0010 add

0110 subtract

0111 set-on-less-than

1100 NOR

Page 25: Single Cycle  Datapath

(25)

ALU Control

• Assume 2-bit ALUOp derived from opcode Combinational logic derives ALU control

opcode ALUOp Operation funct ALU function ALU control

lw 00 load word XXXXXX add 0010

sw 00 store word XXXXXX add 0010

beq 01 branch equal XXXXXX subtract 0110

R-type 10 add 100000 add 0010

subtract 100010 subtract 0110

AND 100100 AND 0000

OR 100101 OR 0001

set-on-less-than 101010 set-on-less-than 0111

• How do we turn this description into gates?

Page 26: Single Cycle  Datapath

(26)

ALU Controller

ALUOp Funct field ALUControlALUOp1 ALUOp0 F5 F4 F3 F2 F1 F0

0 0 X X X X X X 010X 1 X X X X X X 1101 X X X 0 0 0 0 0101 X X X 0 0 1 0 1101 X X X 0 1 0 0 0001 X X X 0 1 0 1 0011 X X X 1 0 1 0 111

inst[5:0]Generated fromDecoding inst[31:26]

ALU control

ALU

result

ALU

Zero

3

addsubaddsubandorslt

lw/swbeq

arith

ALU control

ALUOp

funct =inst[5:0]

Page 27: Single Cycle  Datapath

(27)

ALU Control

• Simple combinational logic (truth tables)

Operation2

Operation1

Operation0

Operation

ALUOp1

F3

F2

F1

F0

F (5– 0)

ALUOp0

ALUOp

ALU control block

Page 28: Single Cycle  Datapath

(28)

Datapath With Control

Use rt not rd

Instruction RegDst ALUSrcMemto-

RegReg

WriteMem Read

Mem Write Branch ALUOp1 ALUp0

R-format 1 0 0 1 0 0 0 1 0lw 0 1 1 1 1 0 0 0 0sw X 1 X 0 0 1 0 0 0beq X 0 X 0 0 0 1 0 1

Page 29: Single Cycle  Datapath

(29)

Commodity ProcessorsARM 7

Single Cycle Datapath

Page 30: Single Cycle  Datapath

(30)

Control Unit Signals

R-format Iw sw beq

Op0

Op1

Op2

Op3

Op4

Op5

Inputs

Outputs

RegDst

ALUSrc

MemtoReg

RegWrite

MemRead

MemWrite

Branch

ALUOp1

ALUOpO

To harness the datapath

Inst[31:26]Instruction RegDst ALUSrc

Memto-

Reg

Reg

Write

Mem

Read

Mem

Write Branch ALUOp1 ALUp0

R-format 1 0 0 1 0 0 0 1 0

lw 0 1 1 1 1 0 0 0 0

sw X 1 X 0 0 1 0 0 0

beq X 0 X 0 0 0 1 0 1

Page 31: Single Cycle  Datapath

(31)

Controller Implementation

LIBRARY IEEE;USE IEEE.STD_LOGIC_1164.ALL;USE IEEE.STD_LOGIC_ARITH.ALL;USE IEEE.STD_LOGIC_SIGNED.ALL;

ENTITY control IS PORT(

SIGNAL Opcode : IN STD_LOGIC_VECTOR( 5 DOWNTO 0 );

SIGNAL RegDst : OUT STD_LOGIC;SIGNAL ALUSrc : OUT STD_LOGIC;SIGNAL MemtoReg : OUT STD_LOGIC;SIGNAL RegWrite : OUT STD_LOGIC;SIGNAL MemRead : OUT STD_LOGIC;SIGNAL MemWrite : OUT STD_LOGIC;SIGNAL Branch : OUT STD_LOGIC;SIGNAL ALUop : OUT STD_LOGIC_VECTOR( 1 DOWNTO

0 );SIGNAL clock, reset : IN STD_LOGIC );

END control;

Page 32: Single Cycle  Datapath

(32)

Controller Implementation (cont.)

ARCHITECTURE behavior OF control IS

SIGNAL R_format, Lw, Sw, Beq : STD_LOGIC;

BEGIN -- Code to generate control signals using

opcode bitsR_format <= '1' WHEN Opcode = "000000" ELSE '0';Lw <= '1' WHEN Opcode = "100011" ELSE '0';

Sw <= '1' WHEN Opcode = "101011" ELSE '0'; Beq <= '1' WHEN Opcode = "000100" ELSE '0'; RegDst <= R_format; ALUSrc <= Lw OR Sw;

MemtoReg <= Lw; RegWrite <= R_format OR Lw; MemRead <= Lw; MemWrite <= Sw; Branch <= Beq;

ALUOp( 1 ) <= R_format;ALUOp( 0 ) <= Beq;

END behavior;

Implementation of each table

column

Instruction RegDst ALUSrc

Memto-

Reg

Reg

Write

Mem

Read

Mem

Write Branch ALUOp1 ALUp0

R-format 1 0 0 1 0 0 0 1 0

lw 0 1 1 1 1 0 0 0 0

sw X 1 X 0 0 1 0 0 0

beq X 0 X 0 0 0 1 0 1

Page 33: Single Cycle  Datapath

(33)

R-Type Instruction

Page 34: Single Cycle  Datapath

(34)

Load Instruction

Page 35: Single Cycle  Datapath

(35)

Branch-on-Equal Instruction

Page 36: Single Cycle  Datapath

(36)

Implementing Jumps

• Jump uses word address• Update PC with concatenation of

Top 4 bits of old PC 26-bit jump address 00

• Need an extra control signal decoded from opcode

2 address

31:26 25:0

Jump

Page 37: Single Cycle  Datapath

(37)

Datapath With Jumps Added

Page 38: Single Cycle  Datapath

(38)

Energy Behavior

combinational activity

storage read/write access

Page 39: Single Cycle  Datapath

(39)

Recall Hierarchy of Energy Models

Vin Vout

Vdd

PMOS

Ground

NMOS

ab

c

x

y

QQ

_Q

Q

_Q

Dlatch

D

C

Dlatch

DD

C

C

ALU

Switch level activity (dynamic) and leakage (static) energy costs

Aggregate energy expenditure into gate

level estimates

Aggregate energy expenditure into

higher level modules

Page 40: Single Cycle  Datapath

(40)

A Simple Architecture Energy Model

• To a first order, we can use the per-access energy of each major component Obtain this for a technology generation

• Use this per-access energy to compute the energy of each instruction

• Note: This is a high level approximation. The actual physics

is more complicated. However, this useful for several purposes

• What components do each instruction exercise?

Page 41: Single Cycle  Datapath

(41)

Example: Updating the PC

MemtoReg

MemRead

MemWrite

ALUOp

ALUSrc

RegDst

PC

Instructionmemory

Readaddress

Instruction[31–0]

Instruction [20–16]

Instruction [25–21]

Add

Instruction [5–0]

RegWrite

4

16 32Instruction [15–0]

0Registers

WriteregisterWritedata

Writedata

Readdata 1

Readdata 2

Readregister 1Readregister 2

Signextend

ALUresult

Zero

Datamemory

Address Readdata M

ux

1

1

Mux

0

1

Mux

0

1

Mux

0

Instruction [15–11]

ALUcontrol

Shiftleft 2

ALU

Add ALUresult

Branch

What is the energy cost of this operation?

Page 42: Single Cycle  Datapath

(42)

Example: Register Instructions

MemtoReg

MemRead

MemWrite

ALUOp

ALUSrc

RegDst

PC

Instructionmemory

Readaddress

Instruction[31–0]

Instruction [20–16]

Instruction [25–21]

Add

Instruction [5–0]

RegWrite

4

16 32Instruction [15–0]

0Registers

WriteregisterWritedata

Writedata

Readdata 1

Readdata 2

Readregister 1Readregister 2

Signextend

ALUresult

Zero

Datamemory

Address Readdata M

ux

1

1

Mux

0

1

Mux

0

1

Mux

0

Instruction [15–11]

ALUcontrol

Shiftleft 2

ALU

Add ALUresult

Branch

What is the energy cost of this operation?

Page 43: Single Cycle  Datapath

(43)

Example: I-type Instructions

MemtoReg

MemRead

MemWrite

ALUOp

ALUSrc

RegDst

PC

Instructionmemory

Readaddress

Instruction[31–0]

Instruction [20–16]

Instruction [25–21]

Add

Instruction [5–0]

RegWrite

4

16 32Instruction [15–0]

0Registers

WriteregisterWritedata

Writedata

Readdata 1

Readdata 2

Readregister 1Readregister 2

Signextend

ALUresult

Zero

Datamemory

Address Readdata M

ux

1

1

Mux

0

1

Mux

0

1

Mux

0

Instruction [15–11]

ALUcontrol

Shiftleft 2

ALU

Add ALUresult

Branch

What is the energy cost of this operation?

Page 44: Single Cycle  Datapath

(44)

Example: I-Type for Branches

MemtoReg

MemRead

MemWrite

ALUOp

ALUSrc

RegDst

PC

Instructionmemory

Readaddress

Instruction[31–0]

Instruction [20–16]

Instruction [25–21]

Add

Instruction [5–0]

RegWrite

4

16 32Instruction [15–0]

0Registers

WriteregisterWritedata

Writedata

Readdata 1

Readdata 2

Readregister 1Readregister 2

Signextend

ALUresult

Zero

Datamemory

Address Readdata M

ux

1

1

Mux

0

1

Mux

0

1

Mux

0

Instruction [15–11]

ALUcontrol

Shiftleft 2

ALU

Add ALUresult

Branch

What is the energy cost of this operation?

Page 45: Single Cycle  Datapath

(45)

Converting Energy to Power

• For this data path, except for data memory, all components are active every cycle, and dissipating energy on every cycle Later we will see how data paths can be made more

energy efficient

• Computing power Compute the total energy consumed over all cycles

(instructions) Divide energy by time to get power in watts

Example:

Page 46: Single Cycle  Datapath

(46)

Example: A Simple Energy Model• We can use a simple model of per-access

energy for the architecture componentsCommon Components Access Energy (10-12 joules)

Inst. Decode Logic Switching 16.78

Inst. Registers Read 2.74 Write 4.38

FP. Registers Read 1.26 Write 1.98Other Buffers Read 9.74 Write 11.18

ALU + Result Bus (interconnect) Logic Switching 123.2FPU + Result Bus (interconnect) Logic Switching 241.02

• Each unit can be accessed multiple times depending on instruction type• An Intel/AMD x86 instruction consume 600pJ ~ 4nJ dynamic energy.

@16nm

Page 47: Single Cycle  Datapath

(47)

ITRS Roadmap for Logic Devices

From: “ExaScale Computing Study: Technology Challenges in Achieving Exascale Systems,” P. Kogge, et.al, 2008

Page 48: Single Cycle  Datapath

(48)

• All of the logic is combinational

• We wait for everything to settle down, and the right thing to be done ALU might not produce “right answer” right away

we use write signals along with clock to determine when to write

• Cycle time determined by length of the longest path

Our Simple Control Structure

We are ignoring some details like setup and hold times

Clock cycle

Stateelement

1Combinational logic

Stateelement

2

Page 49: Single Cycle  Datapath

(49)

Performance Issues

• Longest delay determines clock period Critical path: load instruction Instruction memory register file ALU data

memory register file

• Not feasible to vary period for different instructions

• Violates design principle Making the common case fast

• We will improve performance by pipelining

Page 50: Single Cycle  Datapath

(50)

Summary

• Single cycle datapath All instructions execute in one clock cycle Not all instructions take the same amount of time Software sees a simple interface Can memory operations really take one cycle?

• Improve performance via pipelining, multi-cycle operation, parallelism or customization

• We will address these next

Page 51: Single Cycle  Datapath

(51)

Study Guide

• Given an instruction, be able to specify the values of all control signals required to execute that instruction

• Add new instructions: modify the datapath and control to affect its execution E.g., jal, jr, shift, etc. Modify the VHDL controller

• Given delays of various components, determine the cycle time of the datapath

• Distinguish between those parts of the datapath that are unique to each instruction and those components that are shared across all instructions

Page 52: Single Cycle  Datapath

(52)

Study Guide (cont.)

• Given a set of control signal values determine what operation the datapath performs

• Given the per access energies of each component: Compute the energy required of any instruction Given a program and clock rate compute the power

dissipation of the datapath

Page 53: Single Cycle  Datapath

(53)

Glossary

• Asynchronous• Clock• Controller • Critical path• Flip Flop

• ITRS Roadmap• Per-access energy• Program counter• Register• Synchronous