EE414 Embedded Systems Ch 3. General-Purpose Processors: Software...

27
EE414 Embedded Systems Ch 3. General-Purpose Processors: Software Part 1/4 Byung Kook Kim School of Electrical Engineering Korea Advanced Institute of Science and Technology

Transcript of EE414 Embedded Systems Ch 3. General-Purpose Processors: Software...

Page 1: EE414 Embedded Systems Ch 3. General-Purpose Processors: Software …rtcl.kaist.ac.kr/~bkkim/lecture/embedded/EmbSys3A_GPPA_D2.pdf · EE414 Embedded Systems Ch 3. General-Purpose

EE414 Embedded Systems

Ch 3. General-Purpose Processors: Software

Part 1/4

Byung Kook KimSchool of Electrical Engineering

Korea Advanced Institute of Science and Technology

Page 2: EE414 Embedded Systems Ch 3. General-Purpose Processors: Software …rtcl.kaist.ac.kr/~bkkim/lecture/embedded/EmbSys3A_GPPA_D2.pdf · EE414 Embedded Systems Ch 3. General-Purpose

Embedded Systems, KAIST 2

OverviewPart 1

3.1 Introduction 3.2 Basic Architecture 3.3 Operation

Part 2 3.4 Programmer’s View 3.5 Development Environment 3.6 Application-Specific Instruction-Set Processors (ASIPs) 3.7 Selecting a Microprocessor 3.8 General-Purpose Processor Design

Page 3: EE414 Embedded Systems Ch 3. General-Purpose Processors: Software …rtcl.kaist.ac.kr/~bkkim/lecture/embedded/EmbSys3A_GPPA_D2.pdf · EE414 Embedded Systems Ch 3. General-Purpose

Embedded Systems, KAIST 3

3.1 Introduction General-Purpose Processor

Processor designed for a variety of computation tasks Adv. of GPP

Low unit cost, in part because manufacturer spreads NRE over large numbers of units Motorola sold half a billion 68HC05 microcontrollers in 1996 alone

Carefully designed since higher NRE is acceptable Can yield good performance, size, and power

Low NRE cost, short time-to-market/prototype, high flexibility User just writes software; no processor design.

Disadv. Cost, power, size

COTS (Commercially Off-The-Shelf) “microprocessor” Processor implemented on one chip.

Page 4: EE414 Embedded Systems Ch 3. General-Purpose Processors: Software …rtcl.kaist.ac.kr/~bkkim/lecture/embedded/EmbSys3A_GPPA_D2.pdf · EE414 Embedded Systems Ch 3. General-Purpose

Embedded Systems, KAIST 4

3.2 Basic Architecture General-purpose

processor (CPU): Control unit and datapath, tightly linked with a memory.

Key differences to SPP Datapath is general. Control unit doesn’t

store the algorithm –the algorithm is “programmed” into the memory.

PC & IR

ProcessorControl unit Datapath

ALU

Registers

IRPC

Controller

MemoryI/O

Control/Status

Page 5: EE414 Embedded Systems Ch 3. General-Purpose Processors: Software …rtcl.kaist.ac.kr/~bkkim/lecture/embedded/EmbSys3A_GPPA_D2.pdf · EE414 Embedded Systems Ch 3. General-Purpose

Embedded Systems, KAIST 5

A. Datapath - Operations ALU (Arithmetic and

Logic Unit) Load

Read memory location into register ALU operation

Arithmetic & logic operation Input certain registers

through ALU Do operation Store back in register

Store Write register to memory

location. Ex: Load, Inc, Store Size N: N-bit-wide

ProcessorControl unit Datapath

ALU

Registers

IRPC

Controller

MemoryI/O

Control/Status

10...

...

10

+1

11

11

Page 6: EE414 Embedded Systems Ch 3. General-Purpose Processors: Software …rtcl.kaist.ac.kr/~bkkim/lecture/embedded/EmbSys3A_GPPA_D2.pdf · EE414 Embedded Systems Ch 3. General-Purpose

DatapathSize of datapath operation (N-bit computer)

8 bits Character 0 to 255 (0 to FFh/0xFF). 256 English characters.

16 bits Integer -32768 to 32767 (-215 to 215-1). Hangul characters.

32 bits Integer -2G to 2G (-2x109 to 2x109, -231 to 231-1). Floating point number (Smx2Sse).

64 bits Integer -8E to 8E (exa. -8x1018 to 8x1018). Double precision floating point number (SMx2se).

Embedded Systems, KAIST 6

Page 7: EE414 Embedded Systems Ch 3. General-Purpose Processors: Software …rtcl.kaist.ac.kr/~bkkim/lecture/embedded/EmbSys3A_GPPA_D2.pdf · EE414 Embedded Systems Ch 3. General-Purpose

Embedded Systems, KAIST 7

B. Control Unit Control unit: configures the

datapath operations Sequence of desired operations

(“instructions”) stored in memory – “program”

Program Sequence of instructions

including loop and branch. Instruction

Set of meaningful bits to do some operation.

Machine language Binary code

Ex: 0011 | 00 | 1000 0001 Assembly language

Symbolic expression Ex: LD R0, M(513)

1:1 to machine language

ProcessorControl unit Datapath

ALU

Registers

IRPC

Controller

MemoryI/O

Control/Status

10...

...load R0, M[500] 500

501

100inc R1, R0101

store M[501], R1102

R0 R1

Page 8: EE414 Embedded Systems Ch 3. General-Purpose Processors: Software …rtcl.kaist.ac.kr/~bkkim/lecture/embedded/EmbSys3A_GPPA_D2.pdf · EE414 Embedded Systems Ch 3. General-Purpose

Control UnitInstruction cycle

Broken into several sub-operations, each one clock cycle, e.g.: 1. Fetch: Get next instruction into IR 2. Decode: Determine what the instruction means 3. Fetch operands: Move data from memory to datapath

register 4. Execute: Move data through the ALU to reg 5. Store results: Write data from register to memory

Embedded Systems, KAIST 8

Page 9: EE414 Embedded Systems Ch 3. General-Purpose Processors: Software …rtcl.kaist.ac.kr/~bkkim/lecture/embedded/EmbSys3A_GPPA_D2.pdf · EE414 Embedded Systems Ch 3. General-Purpose

Embedded Systems, KAIST 9

Control Unit Sub-Operation 1 1. Fetch

Get next instruction into IR

PC: Program Counter Always points to

next instruction Register with

increment IR: Instruction

Register Holds the fetched

instruction.

ProcessorControl unit Datapath

ALU

Registers

IRPC

Controller

MemoryI/O

Control/Status

10...

...

load R0, M[500] 500501

100inc R1, R0101

store M[501], R1102

R0 R1100 load R0, M[500]

Page 10: EE414 Embedded Systems Ch 3. General-Purpose Processors: Software …rtcl.kaist.ac.kr/~bkkim/lecture/embedded/EmbSys3A_GPPA_D2.pdf · EE414 Embedded Systems Ch 3. General-Purpose

Embedded Systems, KAIST 10

Control Unit Sub-Operation 2 2. Decode

Determine what the instruction means

Using combinational logic

ProcessorControl unit Datapath

ALU

Registers

IRPC

Controller

MemoryI/O

Control/Status

10...

...

load R0, M[500] 500501

100inc R1, R0101

store M[501], R1102

R0 R1100 load R0, M[500]

Page 11: EE414 Embedded Systems Ch 3. General-Purpose Processors: Software …rtcl.kaist.ac.kr/~bkkim/lecture/embedded/EmbSys3A_GPPA_D2.pdf · EE414 Embedded Systems Ch 3. General-Purpose

Embedded Systems, KAIST 11

Control Unit Sub-Operation 3 3. Fetch operands

Move data from memory to datapath register

Read data memory

ProcessorControl unit Datapath

ALU

Registers

IRPC

Controller

MemoryI/O

Control/Status

10...

...

load R0, M[500] 500501

100inc R1, R0101

store M[501], R1102

R0 R1100 load R0, M[500]

10

Page 12: EE414 Embedded Systems Ch 3. General-Purpose Processors: Software …rtcl.kaist.ac.kr/~bkkim/lecture/embedded/EmbSys3A_GPPA_D2.pdf · EE414 Embedded Systems Ch 3. General-Purpose

Embedded Systems, KAIST 12

Control Unit Sub-Operation 4 4. Execute

Move data through the ALU

This particular instruction LOAD does nothing during this sub-operation

INC (R0 +1) ->R1

ProcessorControl unit Datapath

ALU

Registers

IRPC

Controller

MemoryI/O

Control/Status

10...

...

load R0, M[500] 500501

100inc R1, R0101

store M[501], R1102

R0 R1100 load R0, M[500]

10

Page 13: EE414 Embedded Systems Ch 3. General-Purpose Processors: Software …rtcl.kaist.ac.kr/~bkkim/lecture/embedded/EmbSys3A_GPPA_D2.pdf · EE414 Embedded Systems Ch 3. General-Purpose

Embedded Systems, KAIST 13

Control Unit Sub-Operation 5 5. Store results

Write data from register to memory

This particular instruction LOAD does nothing during this sub-operation

STORE R1 ->m[501]

ProcessorControl unit Datapath

ALU

Registers

IRPC

Controller

MemoryI/O

Control/Status

10...

...

load R0, M[500] 500501

100inc R1, R0101

store M[501], R1102

R0 R1100 load R0, M[500]

10

Page 14: EE414 Embedded Systems Ch 3. General-Purpose Processors: Software …rtcl.kaist.ac.kr/~bkkim/lecture/embedded/EmbSys3A_GPPA_D2.pdf · EE414 Embedded Systems Ch 3. General-Purpose

Embedded Systems, KAIST 14

C. Instruction Cycles

ProcessorControl unit Datapath

ALU

Registers

IRPC

Controller

MemoryI/O

Control/Status

10...

...

load R0, M[500] 500501

100inc R1, R0101

store M[501], R1102

R0 R1

PC=100

10

Fetch ops

Exec. Store results

clk

Fetch

load R0, M[500]

Decode

100

Page 15: EE414 Embedded Systems Ch 3. General-Purpose Processors: Software …rtcl.kaist.ac.kr/~bkkim/lecture/embedded/EmbSys3A_GPPA_D2.pdf · EE414 Embedded Systems Ch 3. General-Purpose

Embedded Systems, KAIST 15

Instruction Cycles (II)

ProcessorControl unit Datapath

ALU

Registers

IRPC

Controller

MemoryI/O

Control/Status

10...

...

load R0, M[500] 500501

100inc R1, R0101

store M[501], R1102

R0 R110

PC=100Fetch Decode Fetch

opsExec. Store

resultsclk

PC=101

inc R1, R0

Fetch Fetch ops

+1

11

Exec. Store results

clk

101

Decode

Page 16: EE414 Embedded Systems Ch 3. General-Purpose Processors: Software …rtcl.kaist.ac.kr/~bkkim/lecture/embedded/EmbSys3A_GPPA_D2.pdf · EE414 Embedded Systems Ch 3. General-Purpose

Embedded Systems, KAIST 16

Instruction Cycles (III)

ProcessorControl unit Datapath

ALU

Registers

IRPC

Controller

MemoryI/O

Control/Status

10...

...

load R0, M[500] 500501

100inc R1, R0101

store M[501], R1102

R0 R11110

PC=100Fetch Decode Fetch

opsExec. Store

resultsclk

PC=101Fetch Decode Fetch

opsExec. Store

resultsclk

PC=102store M[501], R1

Fetch Fetch ops

Exec.

11

Store results

clk

Decode

102

Page 17: EE414 Embedded Systems Ch 3. General-Purpose Processors: Software …rtcl.kaist.ac.kr/~bkkim/lecture/embedded/EmbSys3A_GPPA_D2.pdf · EE414 Embedded Systems Ch 3. General-Purpose

Embedded Systems, KAIST 17

D. Architectural Considerations N-bit processor

N-bit ALU, registers, buses, memory data interface

Embedded: 8-bit, 16-bit, 32-bit common

Desktop/servers: 32-bit, even 64

M-bit address PC size determines

address space M-bit PC 2M memory space N x 2M bits total

ProcessorControl unit Datapath

ALU

Registers

IRPC

Controller

MemoryI/O

Control/Status

Page 18: EE414 Embedded Systems Ch 3. General-Purpose Processors: Software …rtcl.kaist.ac.kr/~bkkim/lecture/embedded/EmbSys3A_GPPA_D2.pdf · EE414 Embedded Systems Ch 3. General-Purpose

Sizes of data and address

1000m: 1 K kilo, 2 M mega, 3 G giga, 4 T tera, 5 P peta, 6 E exa,

7 Z zetta, 8 Y yotta

Embedded Systems, KAIST 18

Architectural Considerations (II)

Microprocessor Data Address8080 (1973) 8 16 (64 KB)

8086 16 20 (1 MB)i386 32 32 (4 GB)

Pentium 32 32 (4 GB)Pentium D 32 64 (16 EB)

Core 2 64 64 (16 EB)Xscale 32 32 (4 GB)

Page 19: EE414 Embedded Systems Ch 3. General-Purpose Processors: Software …rtcl.kaist.ac.kr/~bkkim/lecture/embedded/EmbSys3A_GPPA_D2.pdf · EE414 Embedded Systems Ch 3. General-Purpose

Embedded Systems, KAIST 19

Architectural Considerations (III) Clock frequency

Inverse of clock period

f (Hz)= 1/p(sec)

p Must be longer than longest register to register delay in entire processor

Memory access is often the longest.

ProcessorControl unit Datapath

ALU

Registers

IRPC

Controller

MemoryI/O

Control/Status

Page 20: EE414 Embedded Systems Ch 3. General-Purpose Processors: Software …rtcl.kaist.ac.kr/~bkkim/lecture/embedded/EmbSys3A_GPPA_D2.pdf · EE414 Embedded Systems Ch 3. General-Purpose

Embedded Systems, KAIST 20

3.3 Operation Instruction execution

1. Fetch instruction 2. Decode instruction 3. Fetch operands 4. Execute operation 5. Store results

Processor speed: 5 clock cycles / instruction Processor throughput: 5N clock cycles / N instructions

How can we improve the processor speed? Fastest possible clock.

How can we improve the processor throughput maximally?

Page 21: EE414 Embedded Systems Ch 3. General-Purpose Processors: Software …rtcl.kaist.ac.kr/~bkkim/lecture/embedded/EmbSys3A_GPPA_D2.pdf · EE414 Embedded Systems Ch 3. General-Purpose

Embedded Systems, KAIST 21

Pipelining:Increasing Instruction Throughput

1 2 3 4 5 6 7 8

1 2 3 4 5 6 7 8

1 2 3 4 5 6 7 8

1 2 3 4 5 6 7 8

Fetch-instr.

Decode

Fetch ops.

Execute

Store res.

1 2 3 4 5 6 7 8

1 2 3 4 5 6 7 8

1 2 3 4 5 6 7 8

1 2 3 4 5 6 7 8

1 2 3 4 5 6 7 8

Wash

Dry

Time

Non-pipelined Pipelined

Time

Time

Pipelined

pipelined instruction execution

non-pipelined dish cleaning pipelined dish cleaning

Instruction 1

Branch problem Sophisticated branch predictor

Page 22: EE414 Embedded Systems Ch 3. General-Purpose Processors: Software …rtcl.kaist.ac.kr/~bkkim/lecture/embedded/EmbSys3A_GPPA_D2.pdf · EE414 Embedded Systems Ch 3. General-Purpose

Embedded Systems, KAIST 22

Superscalar and VLIW Architectures Performance can be improved by:

Faster clock (but there’s a limit) Pipelining: slice up instruction into stages, overlap stages Multiple ALUs to support more than one instruction stream

Superscalar Scalar: non-vector operations Fetches instructions in batches, executes as many as possible

May require extensive hardware to detect independent instructions VLIW (Very Long Instruction Word): each word in memory has

multiple independent instructions Relies on the compiler to detect and schedule instructions Currently growing in popularity.

Page 23: EE414 Embedded Systems Ch 3. General-Purpose Processors: Software …rtcl.kaist.ac.kr/~bkkim/lecture/embedded/EmbSys3A_GPPA_D2.pdf · EE414 Embedded Systems Ch 3. General-Purpose

Embedded Systems, KAIST 23

Two Memory Architectures

Processor

Program memory

Data memory

Processor

Memory(program and data)

Harvard Princeton

Princeton Fewer memory

wires

Harvard Simultaneous

program and data memory access

Page 24: EE414 Embedded Systems Ch 3. General-Purpose Processors: Software …rtcl.kaist.ac.kr/~bkkim/lecture/embedded/EmbSys3A_GPPA_D2.pdf · EE414 Embedded Systems Ch 3. General-Purpose

Embedded Systems, KAIST 24

Cache Memory Memory access may be slow Cache is small but fast

memory close to processor Holds copy of part of memory Hits and misses

Processor

Memory

Cache

Fast/expensive technology, usually on the same chip

Slower/cheaper technology, usually on a different chip

Page 25: EE414 Embedded Systems Ch 3. General-Purpose Processors: Software …rtcl.kaist.ac.kr/~bkkim/lecture/embedded/EmbSys3A_GPPA_D2.pdf · EE414 Embedded Systems Ch 3. General-Purpose

Cache Memory (II) Cache vs. Register & Memory

PC 8 regs. 1-2 MB cache. 1 – 4 GB memory.

Embedded Systems, KAIST 25

Type Impl. Speed Cost/bitRegister D f/f Fastest HighCache SRAM Faster MediumMemory DRAM Fast Low

Page 26: EE414 Embedded Systems Ch 3. General-Purpose Processors: Software …rtcl.kaist.ac.kr/~bkkim/lecture/embedded/EmbSys3A_GPPA_D2.pdf · EE414 Embedded Systems Ch 3. General-Purpose

Embedded Systems, KAIST 26

Part 1/4 Summary General-purpose processors

Good performance, low NRE, flexible

Controller, datapath, and memory

Instruction execution Fetch, decode, fetch operand, execute, and store

Performance enhancement Pipelining Superscalar & VLIW

Page 27: EE414 Embedded Systems Ch 3. General-Purpose Processors: Software …rtcl.kaist.ac.kr/~bkkim/lecture/embedded/EmbSys3A_GPPA_D2.pdf · EE414 Embedded Systems Ch 3. General-Purpose

References [1] Frank Vahid, “Embedded system design: A unified

hardware/software introduction”, John Wiley & Sons, 2002.

Embedded Systems, KAIST 27