EE414 Embedded Systems Ch 3. General-Purpose Processors: Software...

EE414 Embedded Systems

Ch 3. General-Purpose Processors: Software

Part 1/4

Byung Kook KimSchool of Electrical Engineering

Korea Advanced Institute of Science and Technology

Embedded Systems, KAIST 2

OverviewPart 1

3.1 Introduction 3.2 Basic Architecture 3.3 Operation

Part 2 3.4 Programmer’s View 3.5 Development Environment 3.6 Application-Specific Instruction-Set Processors (ASIPs) 3.7 Selecting a Microprocessor 3.8 General-Purpose Processor Design

3.1 Introduction General-Purpose Processor

Processor designed for a variety of computation tasks Adv. of GPP

Low unit cost, in part because manufacturer spreads NRE over large numbers of units Motorola sold half a billion 68HC05 microcontrollers in 1996 alone

Carefully designed since higher NRE is acceptable Can yield good performance, size, and power

Low NRE cost, short time-to-market/prototype, high flexibility User just writes software; no processor design.

Disadv. Cost, power, size

COTS (Commercially Off-The-Shelf) “microprocessor” Processor implemented on one chip.

3.2 Basic Architecture General-purpose

processor (CPU): Control unit and datapath, tightly linked with a memory.

Key differences to SPP Datapath is general. Control unit doesn’t

store the algorithm –the algorithm is “programmed” into the memory.

PC & IR

ProcessorControl unit Datapath

Registers

Controller

MemoryI/O

Control/Status

A. Datapath - Operations ALU (Arithmetic and

Logic Unit) Load

Read memory location into register ALU operation

Arithmetic & logic operation Input certain registers

through ALU Do operation Store back in register

Store Write register to memory

location. Ex: Load, Inc, Store Size N: N-bit-wide

Registers

Controller

MemoryI/O

Control/Status

DatapathSize of datapath operation (N-bit computer)

8 bits Character 0 to 255 (0 to FFh/0xFF). 256 English characters.

16 bits Integer -32768 to 32767 (-215 to 215-1). Hangul characters.

32 bits Integer -2G to 2G (-2x109 to 2x109, -231 to 231-1). Floating point number (Smx2Sse).

64 bits Integer -8E to 8E (exa. -8x1018 to 8x1018). Double precision floating point number (SMx2se).

B. Control Unit Control unit: configures the

datapath operations Sequence of desired operations

(“instructions”) stored in memory – “program”

Program Sequence of instructions

including loop and branch. Instruction

Set of meaningful bits to do some operation.

Machine language Binary code

Ex: 0011 | 00 | 1000 0001 Assembly language

Symbolic expression Ex: LD R0, M(513)

1:1 to machine language

Registers

Controller

MemoryI/O

Control/Status

...load R0, M[500] 500

100inc R1, R0101

store M[501], R1102

Control UnitInstruction cycle

Broken into several sub-operations, each one clock cycle, e.g.: 1. Fetch: Get next instruction into IR 2. Decode: Determine what the instruction means 3. Fetch operands: Move data from memory to datapath

register 4. Execute: Move data through the ALU to reg 5. Store results: Write data from register to memory

Control Unit Sub-Operation 1 1. Fetch

Get next instruction into IR

PC: Program Counter Always points to

next instruction Register with

increment IR: Instruction

Register Holds the fetched

instruction.

Registers

Controller

MemoryI/O

Control/Status

load R0, M[500] 500501

100inc R1, R0101

store M[501], R1102

R0 R1100 load R0, M[500]

Control Unit Sub-Operation 2 2. Decode

Determine what the instruction means

Using combinational logic

Registers

Controller

MemoryI/O

Control/Status

load R0, M[500] 500501

100inc R1, R0101

store M[501], R1102

R0 R1100 load R0, M[500]

Control Unit Sub-Operation 3 3. Fetch operands

Move data from memory to datapath register

Read data memory

Registers

Controller

MemoryI/O

Control/Status

load R0, M[500] 500501

100inc R1, R0101

store M[501], R1102

R0 R1100 load R0, M[500]

Control Unit Sub-Operation 4 4. Execute

Move data through the ALU

This particular instruction LOAD does nothing during this sub-operation

INC (R0 +1) ->R1

Registers

Controller

MemoryI/O

Control/Status

load R0, M[500] 500501

100inc R1, R0101

store M[501], R1102

R0 R1100 load R0, M[500]

Control Unit Sub-Operation 5 5. Store results

Write data from register to memory

This particular instruction LOAD does nothing during this sub-operation

STORE R1 ->m[501]

Registers

Controller

MemoryI/O

Control/Status

load R0, M[500] 500501

100inc R1, R0101

store M[501], R1102

R0 R1100 load R0, M[500]

C. Instruction Cycles

Registers

Controller

MemoryI/O

Control/Status

load R0, M[500] 500501

100inc R1, R0101

store M[501], R1102

PC=100

Fetch ops

Exec. Store results

load R0, M[500]

Decode

Instruction Cycles (II)

Registers

Controller

MemoryI/O

Control/Status

load R0, M[500] 500501

100inc R1, R0101

store M[501], R1102

R0 R110

PC=100Fetch Decode Fetch

opsExec. Store

resultsclk

PC=101

inc R1, R0

Fetch Fetch ops

Exec. Store results

Decode

Instruction Cycles (III)

Registers

Controller

MemoryI/O

Control/Status

load R0, M[500] 500501

100inc R1, R0101

store M[501], R1102

R0 R11110

opsExec. Store

resultsclk

opsExec. Store

resultsclk

PC=102store M[501], R1

Fetch Fetch ops

Store results

Decode

D. Architectural Considerations N-bit processor

N-bit ALU, registers, buses, memory data interface

Embedded: 8-bit, 16-bit, 32-bit common

Desktop/servers: 32-bit, even 64

M-bit address PC size determines

address space M-bit PC 2M memory space N x 2M bits total

Registers

Controller

MemoryI/O

Control/Status

Sizes of data and address

1000m: 1 K kilo, 2 M mega, 3 G giga, 4 T tera, 5 P peta, 6 E exa,

7 Z zetta, 8 Y yotta

Architectural Considerations (II)

Microprocessor Data Address8080 (1973) 8 16 (64 KB)

8086 16 20 (1 MB)i386 32 32 (4 GB)

Pentium 32 32 (4 GB)Pentium D 32 64 (16 EB)

Core 2 64 64 (16 EB)Xscale 32 32 (4 GB)

Architectural Considerations (III) Clock frequency

Inverse of clock period

f (Hz)= 1/p(sec)

p Must be longer than longest register to register delay in entire processor

Memory access is often the longest.

Registers

Controller

MemoryI/O

Control/Status

3.3 Operation Instruction execution

1. Fetch instruction 2. Decode instruction 3. Fetch operands 4. Execute operation 5. Store results

Processor speed: 5 clock cycles / instruction Processor throughput: 5N clock cycles / N instructions

How can we improve the processor speed? Fastest possible clock.

How can we improve the processor throughput maximally?

Pipelining:Increasing Instruction Throughput

1 2 3 4 5 6 7 8

Fetch-instr.

Decode

Fetch ops.

Execute

Store res.

1 2 3 4 5 6 7 8

Non-pipelined Pipelined

Pipelined

pipelined instruction execution

non-pipelined dish cleaning pipelined dish cleaning

Instruction 1

Branch problem Sophisticated branch predictor

Superscalar and VLIW Architectures Performance can be improved by:

Faster clock (but there’s a limit) Pipelining: slice up instruction into stages, overlap stages Multiple ALUs to support more than one instruction stream

Superscalar Scalar: non-vector operations Fetches instructions in batches, executes as many as possible

May require extensive hardware to detect independent instructions VLIW (Very Long Instruction Word): each word in memory has

multiple independent instructions Relies on the compiler to detect and schedule instructions Currently growing in popularity.

Two Memory Architectures

Processor

Program memory

Data memory

Processor

Memory(program and data)

Harvard Princeton

Princeton Fewer memory

Harvard Simultaneous

program and data memory access

Cache Memory Memory access may be slow Cache is small but fast

memory close to processor Holds copy of part of memory Hits and misses

Processor

Memory

Fast/expensive technology, usually on the same chip

Slower/cheaper technology, usually on a different chip

Cache Memory (II) Cache vs. Register & Memory

PC 8 regs. 1-2 MB cache. 1 – 4 GB memory.

Type Impl. Speed Cost/bitRegister D f/f Fastest HighCache SRAM Faster MediumMemory DRAM Fast Low

Part 1/4 Summary General-purpose processors

Good performance, low NRE, flexible

Controller, datapath, and memory

Instruction execution Fetch, decode, fetch operand, execute, and store

Performance enhancement Pipelining Superscalar & VLIW

References [1] Frank Vahid, “Embedded system design: A unified

hardware/software introduction”, John Wiley & Sons, 2002.

EE414 Embedded Systems Ch 3. General-Purpose Processors: Software...

Documents

Transcript of EE414 Embedded Systems Ch 3. General-Purpose Processors: Software...

Embedded Linux - Introduction to Embedded OS

Embedded System:Introduction 1 Introduction What is an embedded system? Embedded system market trends Characteristics of embedded systems Embedded system.

Interactive Distributed Embedded Systems Embedded system ...ziyang.eecs.umich.edu/~dickrp/ides/lectures/print-ides-l1.pdf · Interactive Distributed Embedded Systems ... Embedded

Embedded Graphics based on NVIDIA Quadro Embedded · 2019. 11. 21. · Embedded Graphics based on NVIDIA Quadro Embedded ADLINK’s Embedded MXM GPU modules and PCIe graphics cards

Embedded Android Workshop at Embedded World 2014

Embedded Systems Microcontrollers & Embedded Processors An Overview

Embedded Databases for Embedded Real-Time … · Embedded Databases for Embedded ... real-timesystem must fulﬁll requirements both from an embedded system ... In this report we

4 ee414 - adv electroncs - lab 3 - loren schwappach

LOGO Embedded Systems and Embedded Linux MAD. LOGO Contents INTRODUCTION 1 EMBEDDED LINUX 2 CONCLUSION 3 Embedded Systems & Embedded Linux INTRODUCTION.

Embedded System - Embedded Linux (Full-final)

ENABLING TECHNOLOGIES FOR EMBEDDED SIMULATION & EMBEDDED …isl.ucf.edu/publication/papers/hbahr/06_HBahr_I_ITSEC_98.pdf · EMBEDDED SIMULATION & EMBEDDED TRAINING Hubert A. Bahr

Embedded System Design ---- Embedded OS

Embedded Systems,Embedded Systems Project,Winter training,

Embedded Systems SS2018 - lsw.ee.hm.edulsw.ee.hm.edu/~irber/EmbeddedSystems/powerpoint_deutsch/Embedded... · Was ist ein Embedded System? Unter Embedded Systems (ES) versteht man

Introduction to Networked Embedded Systems · • Introduction to Networked Embedded Systems - Embedded systems Networked embedded systems Embedded Internet - Network properties •

Embedded System Introductiontwins.ee.nctu.edu.tw/courses/embedlab_10/lecture/Embedded Syste… · Embedded System (1/2) Embedded System (1/2) Definition An Embedded System is one

Zsolt Szanya Elbacom. Embedded Systems + Embedded Applications = Specialized Servers Embedded Server Defined “Embedded Systems” means COMPANY’S computing.

EE414 Embedded Systems Ch 3. General-Purpose …rtcl.kaist.ac.kr/~bkkim/lecture/embedded/EmbSys3C_Bone_D2.pdf · Ch 3. General-Purpose Processors: Software Part 3/4: Beaglebone Byung

Ch 1. Introduction to Embedded Systems - KAISTrtcl.kaist.ac.kr/~bkkim/lecture/embedded/EmbSys1B_IntroB_D2.pdf · Ch 1. Introduction to Embedded Systems ... Boolean expr. -> Connection

Taking Back Embedded: The Erlang Embedded Framework