1 MSc - Microprocessors Dr. Konstantinos Tatas com.tk@fit.ac.cy.

Post on 25-Dec-2015

228 views 2 download

Tags:

Transcript of 1 MSc - Microprocessors Dr. Konstantinos Tatas com.tk@fit.ac.cy.

11

MSc - MicroprocessorsMSc - Microprocessors

Dr. Konstantinos TatasDr. Konstantinos Tatas

com.tk@fit.ac.cycom.tk@fit.ac.cy

22

Useful InformationUseful Information

Instructor: Lecturer K. TatasInstructor: Lecturer K. Tatas– Office hours: TBAOffice hours: TBA– E-mail: E-mail: com.tk@fit.ac.cycom.tk@fit.ac.cy– http://http://staff.fit.ac.cy/com.tkstaff.fit.ac.cy/com.tk

Lecture periods/week: 3Lecture periods/week: 3 Duration: 10 weeksDuration: 10 weeks ECTS: 7 (175 hours)ECTS: 7 (175 hours)

33

Course ObjectivesCourse Objectives

By the end of the course students should be By the end of the course students should be able to:able to:– Evaluate the complex trade-offs involved in Evaluate the complex trade-offs involved in

embedded system designembedded system design– Write detailed embedded system requirements and Write detailed embedded system requirements and

specification documentsspecification documents– Write executable specifications using UML/SystemCWrite executable specifications using UML/SystemC– Develop applications using ARM Developer SuiteDevelop applications using ARM Developer Suite– Write efficient ARM assembly and C programs in Write efficient ARM assembly and C programs in

ARM and Thumb modeARM and Thumb mode– Analyze program performance using tracesAnalyze program performance using traces– Use code transformations to improve Use code transformations to improve

performance/code size/power consumption.performance/code size/power consumption.

44

Course Outline (1/2)Course Outline (1/2) Week 1: Introduction to embedded systems – Embedded Week 1: Introduction to embedded systems – Embedded

microprocessor evolution – Design metrics and constraints microprocessor evolution – Design metrics and constraints (performance, power, cost, time-to-market) and design (performance, power, cost, time-to-market) and design optimization challenges - Distributed and Real-time optimization challenges - Distributed and Real-time systemssystems

Week2: Key embedded system technologies – Integrated Week2: Key embedded system technologies – Integrated Circuit technology – Microprocessor technology – CAD tool Circuit technology – Microprocessor technology – CAD tool technology – Sensor technologytechnology – Sensor technology

Week 3: Embedded system specification and modeling – Week 3: Embedded system specification and modeling – Object-oriented specification (UML/C++/SystemC) – Object-oriented specification (UML/C++/SystemC) – Assignment 1Assignment 1

Week 4: Computer Architecture – Instruction sets – RISC Week 4: Computer Architecture – Instruction sets – RISC vs. CISC – pipelining - The ARM microprocessor vs. CISC – pipelining - The ARM microprocessor architecture - ARM assembly – ARM mode – Thumb mode - architecture - ARM assembly – ARM mode – Thumb mode - ARM and Thumb instruction set - ARM conditional ARM and Thumb instruction set - ARM conditional execution execution

Week 5: Processor I/O – Serial I/O – Busy/wait I/O – Week 5: Processor I/O – Serial I/O – Busy/wait I/O – Interrupts – Exceptions – Traps – ARM memory mapped I/O Interrupts – Exceptions – Traps – ARM memory mapped I/O - Caches – Memory Management Units – Protection Units – - Caches – Memory Management Units – Protection Units – ARM cache and MMU – Assignment 2ARM cache and MMU – Assignment 2

55

Course Outline (2/2)Course Outline (2/2)

Week 6: Assignment 1Week 6: Assignment 1 Week 7: Programme design and analysis – Week 7: Programme design and analysis –

DFGs – CDFGs – Compilers – Assemblers – DFGs – CDFGs – Compilers – Assemblers – Linkers – Basic compiler optimizations/code Linkers – Basic compiler optimizations/code transformations – Measuring programme transformations – Measuring programme speed – Trace-driven performance analysis – speed – Trace-driven performance analysis – Energy optimization – programme size Energy optimization – programme size optimizationoptimization

Week 8: Code transformations – Loop Week 8: Code transformations – Loop unrolling – loop merging – loop tiling – unrolling – loop merging – loop tiling – performance optimizing transformationsperformance optimizing transformations

Week 9: TestWeek 9: Test Week 10: Assignment 2Week 10: Assignment 2

66

Course AssessmentCourse Assessment

Final exam: 40%Final exam: 40% Coursework: 60%Coursework: 60%

– Assignment 1: 15%Assignment 1: 15%– Assignment 2: 15%Assignment 2: 15%– Quizzes: 10%Quizzes: 10%– Test: 10%Test: 10%– Lab exercises: 10%Lab exercises: 10%

77

ReferencesReferences

Books– W. Wolf, “Computers as Components”– W. Wolf, “High-Performance Embedded

Computing”– H. Kopetz, “Real-Time Systems: Design

Principles for Distributed Embedded Applications”

– S. Furber, “ARM System-on-Chip Architecture”– P. Panda, “Memory Issues in Embedded

Systems-on-Chip”– F. Vahid and T. Givargis, “Embedded System

Design: A Unified Hardware/Software Introduction”

– F. Catthoor, “Data Access and Storage Management for Embedded Programmable Processors”

88

Microprocessors for Microprocessors for Embedded systemsEmbedded systems Computing systems are everywhereComputing systems are everywhere Most of us think of “desktop” computersMost of us think of “desktop” computers

– PC’sPC’s– LaptopsLaptops– MainframesMainframes– ServersServers

But there’s another type of computing But there’s another type of computing systemsystem– Far more common...Far more common...

99

Embedded systems Embedded systems overviewoverview

Embedded computing systems– Computing systems embedded

within electronic devices– Hard to define. Nearly any

computing system other than a desktop computer

– Billions of units produced yearly, versus millions of desktop units

– Perhaps 50 per household and per automobile

Computers are in here...

and here...

and even here...

Lots more of these, though they cost a lot

less each.

1010

A “short list” of embedded systemsA “short list” of embedded systems

And the list goes on and on

Anti-lock brakesAuto-focus camerasAutomatic teller machinesAutomatic toll systemsAutomatic transmissionAvionic systemsBattery chargersCamcordersCell phonesCell-phone base stationsCordless phonesCruise controlCurbside check-in systemsDigital camerasDisk drivesElectronic card readersElectronic instrumentsElectronic toys/gamesFactory controlFax machinesFingerprint identifiersHome security systemsLife-support systemsMedical testing systems

ModemsMPEG decodersNetwork cardsNetwork switches/routersOn-board navigationPagersPhotocopiersPoint-of-sale systemsPortable video gamesPrintersSatellite phonesScannersSmart ovens/dishwashersSpeech recognizersStereo systemsTeleconferencing systemsTelevisionsTemperature controllersTheft tracking systemsTV set-top boxesVCR’s, DVD playersVideo game consolesVideo phonesWashers and dryers

1111

Some common characteristics Some common characteristics of embedded systemsof embedded systems

Single-functionedSingle-functioned– Executes a single program, repeatedlyExecutes a single program, repeatedly

Tightly-constrainedTightly-constrained– Low cost, low power, small, fast, etc.Low cost, low power, small, fast, etc.

Reactive and real-timeReactive and real-time– Continually reacts to changes in the Continually reacts to changes in the

system’s environmentsystem’s environment– Must compute certain results in real-time Must compute certain results in real-time

without delaywithout delay

1212

An embedded system example An embedded system example – – Digital cameraDigital camera

Single-functioned -- always a digital cameraSingle-functioned -- always a digital camera Tightly-constrained -- Low cost, low power, small, fastTightly-constrained -- Low cost, low power, small, fast Reactive and real-time -- only to a small extentReactive and real-time -- only to a small extent

Microcontroller

CCD preprocessor Pixel coprocessorA2D

D2A

JPEG codec

DMA controller

Memory controller ISA bus interface UART LCD ctrl

Display ctrl

Multiplier/Accum

Digital camera chip

lens

CCD

1313

Embedded Software Development Embedded Software Development Requires as Much/More Design Effort Requires as Much/More Design Effort Than HardwareThan Hardware

1414

A System-on-a-Chip: A System-on-a-Chip: ExampleExample

Courtesy: Philips

1515

Design at a crossroadDesign at a crossroad

System-on-a-ChipSystem-on-a-Chip

RAM

500 k Gates FPGA+ 1 Gbit DRAMPreprocessing

Multi-

SpectralImager

Csystem+2 GbitDRAMRecog-nition

Ana

log

64 SIMD ProcessorArray + SRAM

Image Conditioning100 GOPS

Embedded applications Embedded applications where where cost, performance, cost, performance, and energyand energy are the real are the real issues!issues!

DSP and control intensiveDSP and control intensive Mixed-modeMixed-mode Combines programmable Combines programmable

and application-specific and application-specific modulesmodules

Software plays crucial roleSoftware plays crucial role

1616

Disciplines involved in Disciplines involved in Embedded System Embedded System DesignDesign Digital System DesignDigital System Design Software DesignSoftware Design Analog/Mixed-Signal/RF System DesignAnalog/Mixed-Signal/RF System Design Operating SystemsOperating Systems Microprocessors/Computer ArchitectureMicroprocessors/Computer Architecture VerificationVerification TestingTesting etcetc

1717

Languages traditionally Languages traditionally used in Embedded System used in Embedded System DesignDesign

Specification/Specification/modelingmodeling– UMLUML– SDLSDL– C/C++C/C++

Hardware designHardware design– VHDLVHDL– VerilogVerilog

Software designSoftware design– C/C++C/C++– JavaJava– AssemblyAssembly

VerificationVerification– VHDL/VerilogVHDL/Verilog– SystemVerilogSystemVerilog– Tcl/tkTcl/tk– VeraVera

1818

Design challenge – optimizing Design challenge – optimizing design metricsdesign metrics

Obvious design goal:Obvious design goal:– Construct an implementation with desired Construct an implementation with desired

functionalityfunctionality Key design challenge:Key design challenge:

– Simultaneously optimize numerous design Simultaneously optimize numerous design metricsmetrics

Design metricDesign metric– A measurable feature of a system’s A measurable feature of a system’s

implementationimplementation– Optimizing design metrics is a key Optimizing design metrics is a key

challengechallenge

1919

Design challenge – Design challenge – optimizing design optimizing design metricsmetrics Common metricsCommon metrics

– Unit cost: Unit cost: the monetary cost of manufacturing each the monetary cost of manufacturing each copy of the system, excluding NRE costcopy of the system, excluding NRE cost

– NRE cost (Non-Recurring Engineering NRE cost (Non-Recurring Engineering cost): cost): The one-time monetary cost of designing the The one-time monetary cost of designing the systemsystem

– Size: Size: the physical space required by the systemthe physical space required by the system

– Performance: Performance: the execution time or throughput of the execution time or throughput of the systemthe system

– Power: Power: the amount of power consumed by the systemthe amount of power consumed by the system

– Flexibility: Flexibility: the ability to change the functionality of the ability to change the functionality of the system without incurring heavy NRE costthe system without incurring heavy NRE cost

2020

Design challenge – optimizing Design challenge – optimizing design metricsdesign metrics

Common metrics (continued)Common metrics (continued)– Time-to-prototype: Time-to-prototype: the time needed the time needed

to build a working version of the systemto build a working version of the system

– Time-to-market: Time-to-market: the time required to the time required to develop a system to the point that it can be develop a system to the point that it can be released and sold to customersreleased and sold to customers

– Maintainability: Maintainability: the ability to modify the ability to modify the system after its initial releasethe system after its initial release

– Correctness, safety, many moreCorrectness, safety, many more

2121

Design metric competition -- Design metric competition -- improving one may worsen othersimproving one may worsen others

Expertise with both Expertise with both software and hardware is software and hardware is needed to optimize needed to optimize design metricsdesign metrics– Not just a hardware or Not just a hardware or

software expert, as is software expert, as is commoncommon

– A designer must be A designer must be comfortable with comfortable with various technologies various technologies in order to choose the in order to choose the best for a given best for a given application and application and constraintsconstraints

SizePerformance

Power

NRE cost

Microcontroller

CCD preprocessor Pixel coprocessorA2D

D2A

JPEG codec

DMA controller

Memory controller ISA bus interface UART LCD ctrl

Display ctrl

Multiplier/Accum

Digital camera chip

lens

CCD

2222

Time-to-market: a demanding Time-to-market: a demanding design metricdesign metric

Time required to Time required to develop a product to develop a product to the point it can be sold the point it can be sold to customersto customers

Market windowMarket window– Period during which Period during which

the product would the product would have highest saleshave highest sales

Average time-to-market Average time-to-market constraint is about 8 constraint is about 8 monthsmonths

Delays can be costlyDelays can be costly

Revenues ($)

Time (months)

2323

Losses due to delayed market Losses due to delayed market entryentry

Simplified revenue modelSimplified revenue model– Product life = 2W, peak Product life = 2W, peak

at Wat W– Time of market entry Time of market entry

defines a triangle, defines a triangle, representing market representing market penetrationpenetration

– Triangle area equals Triangle area equals revenuerevenue

Loss Loss – The difference between The difference between

the on-time and the on-time and delayed triangle areasdelayed triangle areas

On-time Delayed

entry entry

Peak revenue

Peak revenue from delayed

entry

Market

riseMarket

fall

W 2WTime

D

On-time

Delayed

Reven

ues (

$)

2424

Losses due to delayed market Losses due to delayed market entry (cont.)entry (cont.)

Area = 1/2 * base * heightArea = 1/2 * base * height– On-time = 1/2 * 2W * WOn-time = 1/2 * 2W * W– Delayed = 1/2 * (W-Delayed = 1/2 * (W-

D+W)*(W-D)D+W)*(W-D) Percentage revenue loss = Percentage revenue loss =

(D(3W-D)/2W(D(3W-D)/2W22)*100%)*100% Try some examplesTry some examples

– Lifetime 2W=52 wks, delay D=4 wks

– (4*(3*26 –4)/2*26^2) = 22%– Lifetime 2W=52 wks, delay D=10

wks– (10*(3*26 –10)/2*26^2) = 50%– Delays are costly!

On-time Delayed

entry entry

Peak revenue

Peak revenue from delayed

entry

Market

riseMarket

fall

W 2WTime

D

On-time

Delayed

Reven

ues (

$)

2525

The performance design metricThe performance design metric

Widely-used measure of system, widely-abusedWidely-used measure of system, widely-abused– Clock frequency, instructions per second – not good Clock frequency, instructions per second – not good

measuresmeasures– Digital camera example – a user cares about how fast it Digital camera example – a user cares about how fast it

processes images, not clock speed or instructions per processes images, not clock speed or instructions per secondsecond

Latency (response time)Latency (response time)– Time between task start and endTime between task start and end– e.g., Camera’s A and B process images in 0.25 secondse.g., Camera’s A and B process images in 0.25 seconds

ThroughputThroughput– Tasks per second, e.g. Camera A processes 4 images per Tasks per second, e.g. Camera A processes 4 images per

secondsecond– Throughput can be more than latency seems to imply due Throughput can be more than latency seems to imply due

to concurrency, e.g. Camera B may process 8 images per to concurrency, e.g. Camera B may process 8 images per second (by capturing a new image while previous image is second (by capturing a new image while previous image is being stored).being stored).

SpeedupSpeedup of B over S = B’s performance / A’s performance of B over S = B’s performance / A’s performance– Throughput speedup = 8/4 = 2Throughput speedup = 8/4 = 2

2626

Three key embedded system Three key embedded system technologiestechnologies

TechnologyTechnology– A manner of accomplishing a task, A manner of accomplishing a task,

especially using technical processes, especially using technical processes, methods, or knowledgemethods, or knowledge

Three key technologies for Three key technologies for embedded systemsembedded systems– Processor technologyProcessor technology– IC technologyIC technology– Design technologyDesign technology

2727

Processor technologyProcessor technology

The architecture of the computation engine used to The architecture of the computation engine used to implement a system’s desired functionalityimplement a system’s desired functionality

Processor does not have to be programmableProcessor does not have to be programmable– ““Processor” Processor” notnot equal to general-purpose equal to general-purpose

processorprocessor

Application-specific

Registers

CustomALU

DatapathController

Program memory

Assembly code for:

total = 0 for i =1 to …

Control logic and State register

Datamemory

IR PC

Single-purpose (“hardware”)

DatapathController

Control logic

State register

Datamemory

index

total

+

IR PC

Registerfile

GeneralALU

DatapathController

Program memory

Assembly code for:

total = 0 for i =1 to …

Control logic and

State register

Datamemory

General-purpose (“software”)

2828

Processor technologyProcessor technology

Processors vary in their customization for the Processors vary in their customization for the problem at handproblem at hand

total = 0for i = 1 to N loop total += M[i]end loop

General-purpose processor

Single-purpose processor

Application-specific processor

Desired functionality

2929

General-purpose General-purpose processorsprocessors

Programmable device used in a Programmable device used in a variety of applicationsvariety of applications– Also known as “microprocessor”Also known as “microprocessor”

FeaturesFeatures– Program memoryProgram memory– General datapath with large General datapath with large

register file and general ALUregister file and general ALU User benefitsUser benefits

– Low time-to-market and NRE Low time-to-market and NRE costscosts

– High flexibilityHigh flexibility ““Pentium” the most well-known, Pentium” the most well-known,

but there are hundreds of othersbut there are hundreds of others

Datapath

IR PC

Registerfile

GeneralALU

Controller

Program memory

Assembly code for:

total = 0 for i =1 to …

Control logic and

State register

Datamemory

3030

Single-purpose Single-purpose processorsprocessors

Digital circuit designed to Digital circuit designed to execute exactly one programexecute exactly one program– a.k.a. coprocessor, accelerator or a.k.a. coprocessor, accelerator or

peripheralperipheral FeaturesFeatures

– Contains only the components Contains only the components needed to execute a single needed to execute a single programprogram

– No program memoryNo program memory BenefitsBenefits

– FastFast– Low powerLow power– Small sizeSmall size

DatapathController

Control logic

State register

Datamemory

index

total

+

3131

Application-specific Application-specific processorsprocessors

Programmable processor optimized Programmable processor optimized for a particular class of applications for a particular class of applications having common characteristicshaving common characteristics– Compromise between general-purpose Compromise between general-purpose

and single-purpose processorsand single-purpose processors FeaturesFeatures

– Program memoryProgram memory– Optimized datapathOptimized datapath– Special functional unitsSpecial functional units

BenefitsBenefits– Some flexibility, good performance, size Some flexibility, good performance, size

and powerand power

Datapath

IR PC

Registers

CustomALU

Controller

Program memory

Assembly code for:

total = 0 for i =1 to …

Control logic and

State register

Datamemory

3232

IC technologyIC technology

The manner in which a digital (gate-level) The manner in which a digital (gate-level) implementation is mapped onto an ICimplementation is mapped onto an IC– IC: Integrated circuit, or “chip”IC: Integrated circuit, or “chip”– IC technologies differ in their customization to IC technologies differ in their customization to

a designa design– IC’s consist of numerous layers (perhaps 10 or IC’s consist of numerous layers (perhaps 10 or

more)more) IC technologies differ with respect to who IC technologies differ with respect to who

builds each layer and whenbuilds each layer and when

source drainchannel

oxide

gate

Silicon substrate

IC package IC

3333

IC technology Design IC technology Design ApproachesApproaches

Custom

Standard CellsCompiled Cells

Macro Cells

Cell-based

Pre-diffused(Gate Arrays)

Pre-wired(FPGA's)

Array-based

Semicustom

IC Technology Implementation Approaches

3434

Full-custom designFull-custom design

All layers are optimized for an embedded All layers are optimized for an embedded system’s particular digital implementationsystem’s particular digital implementation– Placing transistorsPlacing transistors– Sizing transistorsSizing transistors– Routing wiresRouting wires

BenefitsBenefits– Excellent performance, small size, low powerExcellent performance, small size, low power

DrawbacksDrawbacks– High NRE cost (e.g., $300k), long time-to-High NRE cost (e.g., $300k), long time-to-

marketmarket

3535

The Custom Approach The Custom Approach

Intel 4004

Courtesy Intel

3636

Transition to Automation and Transition to Automation and Regular StructuresRegular Structures

Intel 4004 (‘71)Intel 4004 (‘71)Intel 8080Intel 8080 Intel 8085Intel 8085

Intel 8286Intel 8286 Intel 8486Intel 8486Courtesy Intel

3737

3838

IC technology Design IC technology Design ApproachesApproaches

Custom

Standard CellsCompiled Cells

Macro Cells

Cell-based

Pre-diffused(Gate Arrays)

Pre-wired(FPGA's)

Array-based

Semicustom

IC Technology Implementation Approaches

3939

Semi-customSemi-custom

Lower layers are fully or partially builtLower layers are fully or partially built– Designers are left with routing of wires Designers are left with routing of wires

and maybe placing some blocksand maybe placing some blocks BenefitsBenefits

– Good performance, good size, less NRE Good performance, good size, less NRE cost than a full-custom implementation cost than a full-custom implementation (perhaps $10k to $100k)(perhaps $10k to $100k)

DrawbacksDrawbacks– Still require weeks to months to developStill require weeks to months to develop

4040

Cell-based Design (or Cell-based Design (or standard cells)standard cells)

Routing channel requirements arereduced by presenceof more interconnectlayersFunctional

module(RAM,multiplier,…)

Routingchannel

Logic cellFeedthrough cell

Row

s o

f ce

lls

4141

Standard Cell — ExampleStandard Cell — Example

[Brodersen92]

4242

Standard Cell - ExampleStandard Cell - Example

3-input NAND cell(from ST Microelectronics):C = Load capacitanceT = input rise/fall time

4343

IC technology Design IC technology Design ApproachesApproaches

Custom

Standard CellsCompiled Cells

Macro Cells

Cell-based

Pre-diffused(Gate Arrays)

Pre-wired(FPGA's)

Array-based

Semicustom

IC Technology Implementation Approaches

4444

Programmable Logic Programmable Logic DevicesDevices

All layers (diffusion, polysilicon, [multi-] metal) may exist– Designers can purchase an IC– Connections on the IC are either created or

destroyed to implement desired functionality– Field-Programmable Gate Array (FPGA) and

recently Gate Arrays are very popular Benefits

– Low NRE costs, almost instant IC availability Drawbacks

– Bigger, expensive (perhaps $30 per unit), power hungry, slower

4545

Gate Array — Sea-of-Gate Array — Sea-of-gatesgates

rows of

cells

routing channel

uncommitted

VD D

GND

polysilicon

metal

possiblecontact

In1 In2 In3 In4

Out

UncommitedCell

CommittedCell(4-input NOR)

4646

Sea-of-gate Primitive Sea-of-gate Primitive CellsCells

NMOS

PMOS

Oxide-isolation

PMOS

NMOS

NMOS

Using oxide-isolation Using gate-isolation

4747

Sea-of-gatesSea-of-gates

Random Logic

MemorySubsystem

LSI Logic LEA300K(0.6 m CMOS)

4848

Prewired ArraysPrewired Arrays

Classification of prewired arrays (or field-programmable devices):Classification of prewired arrays (or field-programmable devices): Based on Programming TechniqueBased on Programming Technique

– Fuse-based (program-once)Fuse-based (program-once)– Non-volatile EPROM basedNon-volatile EPROM based– RAM basedRAM based

Programmable Logic StyleProgrammable Logic Style– Array-BasedArray-Based– Look-up TableLook-up Table

Programmable Interconnect StyleProgrammable Interconnect Style– Channel-routingChannel-routing– Mesh networksMesh networks

4949

Altera MAXAltera MAX

From Smith97

5050

Altera MAX Interconnect Altera MAX Interconnect ArchitectureArchitecture

LAB2

PIA

LAB1

LAB6

tPIA

tPIA

row channelcolumn channel

LAB

Array-based(MAX 3000-7000)

Mesh-based(MAX 9000)

5151

LUT-Based Logic CellLUT-Based Logic Cell

D4

C1....C4

xxxxxx

D3

D2

D1

F4

F3

F2

F1

Logicfunction

ofxxx

Logicfunction

ofxxx

Logicfunction

ofxxx

xx

xx

4

xxxxxx

xxxxxxxx

xxx

xxxx xxxx xxxx

HP

Bitscontrol

Bitscontrol

Multiplexer Controlledby Configuration Program

x

xx

x

xx

xxx xx

xxxx

x

xxxxxx

xx

x

xx

xxx

xx

Xilinx 4000 Series

5252

Array-Based Array-Based Programmable WiringProgrammable Wiring

Vertical tracks

Input/output pinProgrammed interconnection

InterconnectPoint

Horizontaltracks

Cell

M

5353

Transistor Transistor Implementation of MeshImplementation of Mesh

Courtesy Dehon and Wawrzyniek

5454

RAM-based FPGA RAM-based FPGA

Xilinx XC4000ex

5555

Design TechnologyDesign Technology

The manner in which we convert our concept of The manner in which we convert our concept of desired system functionality into an implementationdesired system functionality into an implementation

Libraries/IP: Incorporates pre-designed implementation from lower abstraction level into higher level.

Systemspecification

Behavioralspecification

RTspecification

Logicspecification

To final implementation

Compilation/Synthesis: Automates exploration and insertion of implementation details for lower level.

Test/Verification: Ensures correct functionality at each level, thus reducing costly iterations between levels.

Compilation/Synthesis

Libraries/IP

Test/Verification

Systemsynthesis

Behaviorsynthesis

RTsynthesis

Logicsynthesis

Hw/Sw/OS

Cores

RTcomponents

Gates/Cells

Model simulat./checkers

Hw-Swcosimulators

HDL simulators

Gate simulators

5656

The co-design ladderThe co-design ladder In the past:In the past:

– Hardware and Hardware and software design software design technologies were technologies were very differentvery different

– Recent maturation Recent maturation of synthesis enables of synthesis enables a unified view of a unified view of hardware and hardware and softwaresoftware

Hardware/software Hardware/software “codesign”“codesign”

Implementation

Assembly instructions

Machine instructions

Register transfers

Compilers(1960's,1970's)

Assemblers, linkers(1950's, 1960's)

Behavioral synthesis(1990's)

RT synthesis(1980's, 1990's)

Logic synthesis(1970's, 1980's)

Microprocessor plus program bits: “software”

VLSI, ASIC, or PLD implementation: “hardware”

Logic gates

Logic equations / FSM's

Sequential program code (e.g., C, VHDL)

The choice of hardware versus software for a particular function is simply a tradeoff among various design metrics, like performance, power, size, NRE cost, and especially flexibility; there is no

fundamental difference between what hardware or software can implement.

5757

Independence of processor and Independence of processor and IC technologiesIC technologies

Basic tradeoffBasic tradeoff– General vs. customGeneral vs. custom– With respect to processor technology or IC With respect to processor technology or IC

technologytechnology– The two technologies are independentThe two technologies are independent

General-purpose

processor

ASIPSingle-purpose

processor

Semi-customPLD Full-custom

General,providing improved:

Customized, providing improved:

Power efficiencyPerformance

SizeCost (high volume)

FlexibilityMaintainability

NRE costTime- to-prototype

Time-to-marketCost (low volume)

5858

Design Decision Trade-offs

5959

Generalised Design Flow

6060

Architecture ReUseArchitecture ReUse

Silicon System PlatformSilicon System Platform– Flexible architecture for hardware and softwareFlexible architecture for hardware and software– Specific (programmable) componentsSpecific (programmable) components– Network architectureNetwork architecture– Software modulesSoftware modules– Rules and guidelines for design of HW and SWRules and guidelines for design of HW and SW

Has been successful in PC’sHas been successful in PC’s– Dominance of a few players who specify and control architectureDominance of a few players who specify and control architecture

Application-domain specificApplication-domain specific (difference in constraints) (difference in constraints)– Speed (compute power)Speed (compute power)– DissipationDissipation– CostsCosts– Real / non-real time dataReal / non-real time data

6161

Platform-Based DesignPlatform-Based Design

A platform is a A platform is a restriction on the space of possible restriction on the space of possible implementation choicesimplementation choices, providing a well-defined abstraction of , providing a well-defined abstraction of the underlying technology for the application developerthe underlying technology for the application developer

New platforms will be defined at the New platforms will be defined at the architecture-micro-architecture-micro-architecture boundaryarchitecture boundary

They will be They will be component-basedcomponent-based, and will provide a range of , and will provide a range of choices from structured-custom to fully programmable choices from structured-custom to fully programmable implementationsimplementations

Key to such approaches is the Key to such approaches is the representation of representation of communicationcommunication in the platform model in the platform model

““Only the consumer gets freedom of choice;Only the consumer gets freedom of choice;designers need freedomdesigners need freedom fromfrom choice”choice”

(Orfali, et al, 1996, p.522)(Orfali, et al, 1996, p.522)

Source:R.Newton

6262

Platform-based Design – System-on-Chip

Use of predefined Intellectual Property (IP)

A platform-based system consists of a RISC processor, memories, busses and a common language

Platform-based design poses the problem of partitioning a solution between hardware (HDL) and software (programming processors)

6363

Platforms Enable Simplified Platforms Enable Simplified SoC DesignSoC Design

Customer demands– Fast turn-around time– Easy access to pre-qualified building

blocks– Web enabled

Design technology– Core platforms– ‘Big’ IP– Emerging SoC bus standards– Embedded software– HW/SW co-verification

Far Peripherals

Near Peripherals

Core

6464

And Automation of IP Selection & Integration

6565

Heterogeneous Heterogeneous Programmable PlatformsProgrammable Platforms

Xilinx Vertex-II Pro

High-speed I/O

Embedded PowerPcEmbedded memories

Hardwired multipliers

FPGA Fabric

6666

Xilinx’s productsXilinx’s products

6767

Xilinx’s productsXilinx’s products

6868

Comparison of CMOS design Comparison of CMOS design methodsmethods

Design Method

NRE Unit Cost Power Dissipation

Complexity of Implementation

Time-to-Market

Performance

Flexibility

μProcessor/DSP

low medium high low low low high

PLA low medium medium low low medium low

FPGA low high medium medium medium medium medium

Gate/Array

medium medium low medium medium medium medium

Cell Based high low low high high high low

Custom Design

high low low high high Very high low

Platform Based

high Low/medium

low high Medium/low

high medium

6969

Impact of Implementation Impact of Implementation ChoicesChoices

En

erg

y E

fficie

ncy (

in M

OP

S/m

W)

Flexibility(or application scope)

0.1-1

1-10

10-100

100-1000

None Fullyflexible

Somewhatflexible

Hard

wir

ed

cu

sto

m

Con

fig

ura

ble

/Para

mete

rizab

le

Dom

ain

-sp

ecifi

c p

rocessor

(e.g

. D

SP

)

Em

bed

ded

mic

rop

rocessor

7070

Design Economics (1)Design Economics (1)

The selling price of an IC Stotal=Ctotal/(1-m), Ctotal is manufacturing cost for a single IC, m desired profit margin

Costs for produce an IC– Non-recurring engineering costs (NREs)– Recurring engineering costs– Fixed costs

7171

Design Economics (2)Design Economics (2)

Non-recurring engineering costs (NREs)– Engineering design cost– Prototype manufacturing cost

Recurring costs– Process– Package– Test

7272

NRE and unit cost NRE and unit cost metricsmetrics Costs:Costs:

– Unit cost: the monetary cost of manufacturing each copy of Unit cost: the monetary cost of manufacturing each copy of the system, excluding NRE costthe system, excluding NRE cost

– NRE cost (Non-Recurring Engineering cost): The one-time NRE cost (Non-Recurring Engineering cost): The one-time monetary cost of designing the systemmonetary cost of designing the system

– total cost = NRE cost + unit cost * # of unitstotal cost = NRE cost + unit cost * # of units– per-product cost per-product cost = total cost / # of units = total cost / # of units

= (NRE cost / # of units) + unit cost= (NRE cost / # of units) + unit cost

• Example– NRE=$2000, unit=$100– For 10 units

– total cost = $2000 + 10*$100 = $3000– per-product cost = $2000/10 + $100 = $300

Amortizing NRE cost over the units results in an additional $200 per unit

7373

NRE and unit cost NRE and unit cost metricsmetrics

$0

$40,000

$80,000

$120,000

$160,000

$200,000

0 800 1600 2400

A

B

C

$0

$40

$80

$120

$160

$200

0 800 1600 2400

Number of units (volume)

A

B

C

Number of units (volume)

tota

l co

st (

x100

0)

pe

r p

rod

uc

t c

ost

Compare technologies by costs -- best depends on Compare technologies by costs -- best depends on quantityquantity– Technology A: NRE=$2,000, unit=$100Technology A: NRE=$2,000, unit=$100– Technology B: NRE=$30,000, unit=$30Technology B: NRE=$30,000, unit=$30– Technology C: NRE=$100,000, unit=$2Technology C: NRE=$100,000, unit=$2

• But, must also consider time-to-market

7474

Wafer and die costWafer and die cost

Die yield: number of good dies/total number of dies

7575

ExampleExample

Assuming:Assuming:– 20 engineers are employed full-time for a year 20 engineers are employed full-time for a year

with a $50,000/year average salarywith a $50,000/year average salary– Additional 200,000 overhead costs of which Additional 200,000 overhead costs of which

100,000 for total testing100,000 for total testing– A wafer cost of $200 per waferA wafer cost of $200 per wafer– A $2 packaging cost per chipA $2 packaging cost per chip– 10 dies/wafer10 dies/wafer– 70% die yield70% die yield– 98% final test yield98% final test yield– A market for 100,000 itemsA market for 100,000 items

Calculate the minimum shelf price of the Calculate the minimum shelf price of the chipchip

7676

Design productivity exponential increase

Exponential increase over the past few Exponential increase over the past few decadesdecades

100,000

10,0001,000100

101

0.1

0.01

19831981 1987 1989 1991 19931985 1995 1997 1999 2001 2003 2005 2007 2009

Productivity(K) Trans./Staff – Mo.

7777

The growing design-productivity gap

Design Productivity Crisis (SRC 1997) Potential Design Complexity and Designer Productivity

20012003

20052007

20092011

20132015

10,000

1,000

100

Den

sity

(K

gat

es / m

m2)

AS

IC c

lock

(M

Hz)

Gates Clock

Moore’s Law: Standard cell density and speed

Lo

gic

Tra

nsi

sto

r p

er C

hip

( M

)

Pro

du

ctivity ( K

) Tran

s./Staff – M

o.

19811983

19851987

19891991

19931995

19971999

20012003

20052007

2009

100,000,000

0.01

0.1

1

10

100

1,000

10,000

Equivalent Added Complexity

1,000

100

10

1

0.1

0.01

0.001

10,000

21% / yr compounded

Productivity Growth Rate

xxx

xxx

x x

58% / yr c

ompounded

Complexity Growth Rate

costt developmen chip

)costunit chipASP (chip*volume

Investment

Return ROI

Logic Tr. / Chip

Tr. / S.M.

7878

Design productivity Design productivity gapgap 1981 leading edge chip required 100 designer months

– 10,000 transistors / 100 transistors/month 2002 leading edge chip requires 30,000 designer months

– 150,000,000 / 5000 transistors/month Designer cost increase from $1M to $300M

While designer productivity has grown at an impressive rate over the past decades, the rate of improvement has not kept pace with chip capacity

7979

The mythical man-The mythical man-monthmonth

The situation is even worse than the productivity gap indicatesThe situation is even worse than the productivity gap indicates In theory, adding designers to team reduces project completion timeIn theory, adding designers to team reduces project completion time In reality, productivity per designer decreases due to complexities of team In reality, productivity per designer decreases due to complexities of team

management and communication management and communication In the software community, known as “the mythical man-month” (Brooks In the software community, known as “the mythical man-month” (Brooks

1975)1975) At some point, can actually lengthen project completion time! (“Too many At some point, can actually lengthen project completion time! (“Too many

cooks”)cooks”)

1M transistors, 1 designer=5000 trans/month

Each additional designer reduces for 100 trans/month

So 2 designers produce 4900 trans/month each

10000

20000

30000

40000

50000

60000

10 20 30 400

43

24

19

1615

1618

23

Team

Individual

Months until completion

Number of designers

8080

SummarySummary

Embedded systems are everywhereEmbedded systems are everywhere Key challenge: optimization of design metricsKey challenge: optimization of design metrics

– Design metrics compete with one anotherDesign metrics compete with one another A unified view of hardware and software is A unified view of hardware and software is

necessary to improve productivitynecessary to improve productivity Three key technologiesThree key technologies

– Processor: general-purpose, application-specific, Processor: general-purpose, application-specific, single-purposesingle-purpose

– IC: Full-custom, semi-custom, PLDIC: Full-custom, semi-custom, PLD– Design: Compilation/synthesis, libraries/IP, Design: Compilation/synthesis, libraries/IP,

test/verificationtest/verification

8181

Real-time and Real-time and distributed systemsdistributed systems

Dr. Konstantinos TatasDr. Konstantinos Tatas

8282

What is real-time? Is What is real-time? Is there any other kind?there any other kind?

A real-time computer system is a A real-time computer system is a computer system where the correctness computer system where the correctness of the system behavior depends not only of the system behavior depends not only on the logical results of the on the logical results of the computations, but also on the physical computations, but also on the physical time when these results are produced. time when these results are produced.

By system behavior we mean the By system behavior we mean the sequence of outputs in time of a system.sequence of outputs in time of a system.

8383

Real-time means Real-time means reactivereactive A real-time computer system must react to stimuli A real-time computer system must react to stimuli

from its environment from its environment The instant when a result must be produced is The instant when a result must be produced is

called a deadline.called a deadline. If a result has utility even after the deadline has If a result has utility even after the deadline has

passed, the deadline is classified as soft, passed, the deadline is classified as soft, otherwise it is firm. otherwise it is firm.

If severe consequences could result if a firm If severe consequences could result if a firm deadline is missed, the deadline is called hard.deadline is missed, the deadline is called hard.

Example: Consider a traffic signal at a road before Example: Consider a traffic signal at a road before a railway crossing. If the traffic signal does not a railway crossing. If the traffic signal does not change to red before the train arrives, an accident change to red before the train arrives, an accident could result.could result.

8484

ReliabilityReliability

The Reliability R(t) of a system is the probability that a system will provide the specified service until time t, given that the system was operational at the beginning (t-t0)

The probability that a system will fail in a given interval of time is expressed by the failure rate, measured in FITs (Failure In Time).

A failure rate of 1 FIT means that the mean time to a failure (MTTF) of a device is 10^9 h, i.e., one failure occurs in about 115,000 years.

If a system has a constant failure rate of λ failures/h, then the reliability at time t is given by

R(t)= exp(-λ(t-to)) MTTF = 1/λ

8585

ExampleExample

What must be the system failure What must be the system failure rate so that 99% of the systems rate so that 99% of the systems in the field work reliably for the in the field work reliably for the first 100,000 hours?first 100,000 hours?

8686

SafetySafety

8787

MaintainabilityMaintainability

8888

Name some hard, firm Name some hard, firm and soft deadline and soft deadline embedded systemsembedded systems

8989

ExampleExample

an automotive company produces 2,000,000 electronic an automotive company produces 2,000,000 electronic engine controllers of a special type. engine controllers of a special type.

The following design alternatives are discussedThe following design alternatives are discussed (a) Construct the engine control unit as a single SRU with (a) Construct the engine control unit as a single SRU with

the application software in Read Only Memory (ROM).The the application software in Read Only Memory (ROM).The production cost of such a unit is $250. In case of an error, production cost of such a unit is $250. In case of an error, the complete unit has to be replaced.the complete unit has to be replaced.

(b) Construct the engine control unit such that the software (b) Construct the engine control unit such that the software is contained in a ROM that is placed on a socket and can be is contained in a ROM that is placed on a socket and can be replaced in case of a software error. The production cost of replaced in case of a software error. The production cost of the unit without the ROM is $248. The cost of the ROM is $5.the unit without the ROM is $248. The cost of the ROM is $5.

(c) Construct the engine control unit as a single SRU where (c) Construct the engine control unit as a single SRU where the software is loaded in a Flash EPROM that can be the software is loaded in a Flash EPROM that can be reloaded. The production cost of such a unit is $255.reloaded. The production cost of such a unit is $255.

The labor cost of repair is assumed to be $50 for each The labor cost of repair is assumed to be $50 for each vehicle. (It is assumed to be the same for each one of the vehicle. (It is assumed to be the same for each one of the three alternatives). three alternatives).

Calculate the cost of a software error for each one of the Calculate the cost of a software error for each one of the three alternative designs if 300,000 cars have to be recalled three alternative designs if 300,000 cars have to be recalled because of the software error (example in Sect. 1.6.1).because of the software error (example in Sect. 1.6.1).

Which one is the lowest cost alternative if only 1,000 cars Which one is the lowest cost alternative if only 1,000 cars are affected by a recall?are affected by a recall?

9090

Distributed RT system Distributed RT system modelmodel From the POV of an outside observer, a real-From the POV of an outside observer, a real-

time (RT) system can be decomposed into time (RT) system can be decomposed into three communicating subsystems: three communicating subsystems: – a controlled object (the physical subsystem, the a controlled object (the physical subsystem, the

behavior of which is governed by the laws of physics),behavior of which is governed by the laws of physics),– a “distributed” computer subsystem (the cyber a “distributed” computer subsystem (the cyber

system, the behavior of which is governed by the system, the behavior of which is governed by the programs that are executed on digital computers) programs that are executed on digital computers)

– a human user or operator a human user or operator The distributed computer system consists of The distributed computer system consists of

computational nodes that interact by the computational nodes that interact by the exchange of messages. exchange of messages.

A computational node can host one or more A computational node can host one or more computational components.computational components.

9191

Event-Triggered Control Event-Triggered Control Versus Time-Triggered Versus Time-Triggered ControlControl

9292